是否有一种快速算法可以将集合的所有分区生成大小为 2 的子集和一个大小为 1 的子集？

如何解决是否有一种快速算法可以将集合的所有分区生成大小为 2 的子集和一个大小为 1 的子集？

如标题所述，我试图生成一组大小为 n 的所有分区，其中所有子集的大小为 2，如果 n 不均匀，则有 ne 个单例集。我非常轻微地修改了一些用于生成所有分区的 SO 代码以得到这个：

def partitionIntoPairs(collection):
    if len(collection) == 1:
        yield [ collection ]
        return

    first = collection[0]
    for smaller in partition2(collection[1:]):
        for n,subset in enumerate(smaller):
            if len(subset):
                yield smaller[:n] + [[ first ] + subset]  + smaller[n+1:]
        yield [ [ first ] ] + smaller

这有效，但遗憾的是太慢了。我的第二个想法是使用 itertools.combinations 为某个集合生成所有对，然后在没有删除给定对的情况下为每个 det 递归调用该函数，但我猜这会更慢。实现也不正确，它只返回一个可能的分区，我不确定如何让它返回所有分区：

from itertools import combinations

def partitionIntoPairs2(collection):
    if not collection:
        return []
    elif len(collection) == 1:
        return [(next(iter(collection)))]
    else: 
        pairs = set(combinations(collection,2))
        for pair in pairs: 
            collection.remove(pair[0])
            collection.remove(pair[1])
            return partition3(collection) + [pair]

我偶然发现了一些用于具有给定集合数的分区的算法，以及生成所有可能分区的算法的各种实现，但就我所见，这些算法都没有有效地解决我的问题。

那么，提出一个更具体的问题：如果第二种算法是一个可行的选择，那么正确的实现是什么？当然，有没有更快的方法来做到这一点？如果是，如何？

解决方法

应将分区视为一组，仅按顺序不同的两个分区应视为同一个。所以数字集只有3个分区（1,2,3,4）。

分区数应该是N!/(N/2)!/2^(N/2)。使用斯特林公式，它大约是。 Sqrt(2)*(N/e)^(N/2) 其中 e=2.71828...而且非常大。

我利用了@VirtualScooter 的代码并提供了分区的递归版本，它比他的 itertools 版本运行得更快（请注意，这不是苹果与苹果的比较，因为我的分区没有重复）。


import itertools
import timeit
t3 = (1,3)
t4 = (1,4)
t6 = (1,4,5,6)

def grouper(iterable,n,fillvalue=None):
    """Collect data into fixed-length chunks or blocks.
        Code from Python itertools page
    """
    # grouper('ABCDEFG','x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.zip_longest(*args,fillvalue=fillvalue)

def partitionIntoPairs(collection):
    perms = itertools.permutations(collection)
    for p in perms:
        group = list(grouper(p,2))
        if group[-1][-1] is None:
            group[-1] = (group[-1][0],)
        yield group

def Partition(Indexable):
    if len(Indexable)<=2:
        yield [Indexable]
    elif len(Indexable)%2==1:
        for i,x in enumerate(Indexable):
            for s in Partition(Indexable[:i]+Indexable[i+1:]):
                yield [[x]]+s
    else:
        for i,x in enumerate(Indexable):
            if i==0:
                x0=x
            else:
                for s in Partition(Indexable[1:i]+Indexable[i+1:]):
                    yield [[x0,x]]+s
def comp_timeit(collection,repeats=1_000):
    s1 = f"l1 = list(Partition({collection}))"
    s2 = f"l1 = list(partitionIntoPairs({collection}))"
    t1 = timeit.timeit(s1,globals=globals(),number=repeats)
    t2 = timeit.timeit(s2,number=repeats)
    print(f"partition,{repeats:_} runs: {t1:,.4f}")
    print(f"itertools,{repeats:_} runs: {t2:,.4f}")
for p in Partition(t4):
    print(p)
comp_timeit(t3)
comp_timeit(t4)
comp_timeit(t6)

这个递归生成器函数yield在分区的长度与原始输入相同时进行分区，并且仅在它可以添加到正在进行的子分区或保留单个子分区（如果len(data)%2 == 1) :

data = {1,3}
def partition(d,m,c = []):
   if len(l:=[j for k in c for j in k]) == len(d):
      yield c
   for i in filter(lambda x:x not in l,d):
      if not c or len(c[-1]) == m:
         yield from partition(d,c=c+[[i]])
      else:
         if sum(len(i) == 1 for i in c) == 1 and len(data)%2:
            yield from partition(d,c=c+[[i]])
         yield from partition(d,c=[*c[:-1],c[-1]+[i]])


print(list(partition(list(data),2)))

输出：

[[[1],[2,3]],[[1,2],[3]],[[1],[3,2]],3],[2]],[[2],[1,[[2,1],1]],[1]],[[3],[[3,[1]]]

当len(data)%2 == 0：

data = {1,4}
print(list(partition(list(data),2)))

输出：

[[[1,4]],[4,4],[[4,1]]]

这可以用 itertools 完成，可能比递归算法更快，像另一个答案 (https://stackoverflow.com/a/66972507/5660315) 中的 partition。我在我的时间序列中测量了 t6 的 4.5 秒运行时间，如下所示，相对于 mi_partition 的时间低于 0.2 秒。

第一个想法是先列出集合的所有排列，然后拆分每个在子集中，使用 grouper 文档中的 itertools 算法页。然后，我们剔除最终奇数大小子集的填充物（如果适用）。

正如@Bing Wang 所指出的，在这种类型的序列中会出现重复。所以，相反，我调用了 more_itertools.set_partitions 函数，它减少重复。这也会生成长度更大的子集大于 2，所以这些被 itertools.filterfalse 过滤掉了。

import itertools
import timeit
import more_itertools

t3 = (1,6)

def mi_partition(collection):
    k = len(collection) // 2 + len(collection) % 2
    s1 = more_itertools.set_partitions(collection,k)
    if False:
        p1,p2 = itertools.tee(s1)
        print(len(list(p1)))
        s1 = p2
    return itertools.filterfalse(lambda x: any(len(y)>2 for y in x),s1)

print(list(mi_partition(t3)))
print(list(mi_partition(t4)))

输出：

[[[1],3]]]
[[[1,4]]]

与 Partition 算法的小时间比较 @Bing Wang 的回答表明他们的解决方案更快：

def comp_timeit(collection,repeats=1_000):
    s3 = f"l1 = list(mi_partition({collection}))"
    s4 = f"l1 = list(Partition({collection}))"
    t3 = timeit.timeit(s3,number=repeats)
    print(f"more_itertools,{repeats:_} runs: {t3:,.4f}")
    t4 = timeit.timeit(s4,number=repeats)
    print(f"Partition,{repeats:_} runs: {t4:,.4f}")
comp_timeit(t3)
comp_timeit(t4)
comp_timeit(t6)

输出如下。请注意，对于 t3 到 t4，结果列表具有两种情况下的长度均为 3，而对于 t5，长度为 15。似乎 Partitions 解决方案稍快，可能因为它不需要过滤任何解决方案。对于 t6， set_partitions(t6,3) 生成 90 个分区，只有 15 个使其成为最终答案。

more_itertools,1_000 runs: 0.0051
Partition,1_000 runs: 0.0024
more_itertools,1_000 runs: 0.0111
Partition,1_000 runs: 0.0026
more_itertools,1_000 runs: 0.1333
Partition,1_000 runs: 0.0160```

您的示例没有显示“所有子集”的含义。如果您需要获取给定集合中所有可能的值对，请尝试使用 set() 和frozenset()

my_set = {1,}

res = set()
for value in my_set:
    current_set = set()
    current_set.add(value)
    for value in my_set:
        new_set = current_set.copy()
        new_set.add(value)
        res.add(frozenset(new_set))
        
if not len(my_set) % 2:
    res = [list(new_set) for new_set in res if len(new_set) > 1]
else:
    res = list(map(list,res))
print(res)

是否有一种快速算法可以将集合的所有分区生成大小为 2 的子集和一个大小为 1 的子集？

如何解决是否有一种快速算法可以将集合的所有分区生成大小为 2 的子集和一个大小为 1 的子集？

解决方法

相关推荐