如何解决获取符合条件的组合
问题:我有一张桌子,我需要在其中提取所有有效的行组合(如果我转置了表格,则提取列)。这些列仅具有值“ +”或“-”,并且在组合的至少一行中具有“ +”的组合被视为有效,也就是说,在所有行中任何带有“-”的组合都是无效的
示例表:
listItem["Description"] = _itemName;
listItem["Due"] = _due;
listItem["ID"] = _id;
listItem["Priority"] = _prioritySwitch;
listItem["Progress"] = _progress;
listItem["Reminder"] = _reminderSwitch;
listItem["SliderSwitch"] = _sliderSwitch;
listItem["Start"] = _start;
有效组合示例:
Guns P_01 P_02 P_03 P_04 P_05 P_06 P_07
0 G_01 + - + + + - +
1 G_02 + + + - + + -
2 G_03 - - - + + + +
3 G_04 + + + - - - -
4 G_05 + + + - - - -
5 G_06 - - - + + + +
6 G_07 + + + - - - -
无效组合示例:
0 G_01 + - + + + - +
1 G_02 + + + - + + -
要获得所有组合,我尝试使用itertools组合,并将结果放入列表中:
3 G_04 + + + - - - -
4 G_05 + + + - - - -
输出:
dfcomb = []
dfcomb = df.apply(lambda r: list(combinations(r,2)),axis=0)
但是现在我被困住了,我知道我应该使用循环来验证任何组合是否有效,但是我该怎么做呢?
解决方法
如果您认为+和-分别为True和False,并为每列应用or
操作:
G_01 + - + + + - +
G_02 + + + - + + -
---------------------------------------
OR + + + + + + + ==> All true. This combo is valid
G_04 + + + - - - -
G_05 + + + - - - -
---------------------------------------
OR + + + - - - - ==> Not all true. This combo is invalid
唯一剩下的问题是如何快速比较它们。我们可以使用numpy的数组广播功能来做到这一点。简而言之,阵列广播是对不同大小的阵列执行操作的行为:
# When you compare an array to a scalar,the logical action is to compare
# every element of the array that scalar
[a,b,c] > d is equivalent to [a > d,b > d,c > d]
# If you want to compare every element of a list against every element of
# another list,things get a little tricky
[a,c] > [d,e] ???
# The trick is to raise the raise the second array up another dimension so you
# you can a comparison matrix. The first array remains 1D,the second array is
# now 2D
[a,c] > [[d],[e]]
# One way to visualize it
d e
a a > d a > e
b b > d b > e
c c > d c > e
这是您问题的答案:
# Life is a lot easier if you put Guns on the index
df.set_index('Guns',inplace=True)
# A 2D array of True/False
a = df.applymap(lambda x: x == '+').to_numpy()
# A 3D array to be used for in the OR operation
b = a[:,None]
# OR-ing every gun with every other gun
c = np.all(a | b,axis=-1)
# This is what c looks like,with some labels added
# G_01 G_02 G_03 G_04 G_05 G_06 G_07
# |------------------------------------------------
# G_01 | False True False False False False False ==> (G_01,G_02) is valid
# G_02 | True False True False False True False ==> (G_02,G_01),(G_02,G_03) and (G_02,G_06) are valid
# G_03 | False True False True True False True
# G_04 | False False True False False True False
# G_05 | False False True False False True False
# G_06 | False True False True True False True
# G_07 | False False True False False True False
# Obviously (G_01,G_02) and (G_02,G_01) are the same combo so we don't need
# to collect both of them. We only need to work with the upper triangle in the
# matrix (`triu` means triangle upper)
valid_combinations = [(df.index[i],df.index[j]) for i,j in np.dstack(np.triu_indices_from(c))[0] if c[i][j]]
由于它使用numpy的广播,因此可以从所有底层矢量化中受益。我在不到3秒的时间内运行了1000 x 1000数据帧(1M元素)。
编辑:要将其扩展为覆盖任意大小的组合,只需在每次迭代中不断提高比较矩阵即可:
def get_valid_combos(df,combo_size=2):
assert combo_size >= 2,'combo_size must be at least 2'
a = df.applymap(lambda x: x == '+').to_numpy()
result = a
while combo_size > 1:
a = a[:,None]
result = result | a
combo_size -= 1
result = result.all(axis=-1)
# Return True if array is monotonically increasing to avoid duplicates
# like (G_1,G_2,G_3) and (G_1,G_3,G_2)
is_increasing = lambda arr: (np.diff(arr) > 0).all()
valid_indicies = np.array(result.nonzero()).transpose()
return [tuple(df.index[idx]) for idx in valid_indicies if is_increasing(idx)]
请注意,这呈指数增长。如果您有n
支枪,则需要n ^ combo_size
的空间,可能需要n ^ (2 * combo_size)
的时间。有很多优化的机会:如果G_1
和G_2
进行有效的组合,则与这2的任何内容也是有效的,因此为我们节省了一些时间。但是我现在太懒了。
您可以使用for-loop
遍历所生成的所有组合,并使用if-statement
检查它们。 any
可以检查是否有两列都带有-
符号的列。确定组合是否有效后,可以将其附加到有效组合列表中。
valid_combinations = []
for combination in combinations:
if not any(p[0] == "-" and p[1] == "-" for p in combination):
valid_combinations.append(combination)
这可以通过列表理解来简化:
valid_combinations = [combination for combination in combinations if not any(p[0] == "-" and p[1] == "-" for p in combination)]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。