表示集合分区的最佳数据结构是什么？

如何解决表示集合分区的最佳数据结构是什么？

通过“集合分区”，特别是 {1,2,...,n} 的集合分区，我的意思是 1 到 n 之间的非空整数集合的集合，这样每个整数都恰好在集合，并且在该集合中不会出现多次。可以将其视为将 1 到 n 之间的所有数字划分为任意数量的盒子的方法，其中盒子及其内容都是无序的。

我需要为这些类型的对象计算两个不同的函数，我将在下面给出伪代码。我可以在不交错这些函数的情况下凑合，因此为它们提供两种不同的数据结构会很好，但如果一种结构对两者都有好处，那就太好了。与这些函数所花费的时间相比，转换为数据结构或从数据结构转换的时间复杂度基本上无关。

第一个函数如下：

setPartition FunctionA(setPartition partition):
     choose a random integer i between 1 and n
     choose a part P of partition uniformly at random 
     if i is in P:
          return(partition)
     else:
          for j in P:
               do a computation depending only on i and j,not on P or partition
          if [condition depending on those computations]:
               remove i from the part it is currently in
               add i to P
               return(partition)
          else:
               return(partition)

请注意，在我们从当前部分删除 i 的步骤中，如果该集合变空，我们需要从集合中删除该集合。我们也不能通过先选择一个随机部分然后再选择一个随机元素来选择 i，因为这不会给出正确的分布，因此可能需要查找 i 所在的部分。（如果我们选择一个随机部分，每个部分的概率与其大小成正比，但这需要我们知道每个部分的大小。）

第二个是：

setPartition FunctionB(setPartition partition):
    with probability p:
        choose two distinct parts P and Q of partition uniformly at random
        for i in P and j in Q:
            do a computation depending only on i and j,not on P,Q,or partition
        if [condition depending on those computations]:
            merge parts P and Q into a single part
            return(partition)
        else:
            return(partition)
    otherwise:
        choose a part P of partition uniformly at random
        choose a subset Q of P uniformly at random (i.e. each element is in Q w.p. 50%)
        for i in Q and j in (P minus Q):
            do a computation depending only on i and j,or partition
        if [condition depending on those computations]:
            remove the elements of Q from P
            create a new part whose elements are Q
            return(partition)
        else:
            return(partition)

再次记住，部分的合并不能留下空的部分——集合分区必须始终由非空集合组成。

这些函数都在一个大循环中运行，修改单个集合分区，比如一百万次或两百万次，所以显然通过就地修改来做到这一点似乎是个好主意。我正在为大约 2000 年左右的 n 执行此操作，因此我认为使用更多内存进行更快的计算是一个很好的折衷。

我已经在 R 中实现了一次，以明显的方式将分区表示为列表列表，但这对于我需要的计算量来说还不够快，所以我决定用 C++ 重新实现它。所以我想确保我一开始就使用正确的数据结构，这样我就不必第三次重新实现它。