微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

根据重叠项将Python列表分组到组中

我有一个列表列表,我试图根据他们的项目对它们进行分组或聚类.如果上一组中没有元素,嵌套列表将启动一个新组.

输入:

paths = [  
        ['D','B','A','H'],['D','C'],['H',['E','G','I'],['F','I']]

我失败的代码

paths = [
    ['D','I']
]
groups = []
paths_clone = paths
for path in paths:
    for node in path:
        for path_clone in paths_clone:
            if node in path_clone:
                if not path == path_clone:
                    groups.append([path,path_clone])
                else:
                    groups.append(path)
print groups

预期产出:

[
 [
  ['D','C']
 ],[
  ['E','I']
 ]
]

一个例子:

paths = [['shifter','barrel','barrel shifter'],['ARM',['IP power','IP','power'],'shifter']]

预期产出组:

output = [
         [['shifter','shifter']],[['IP power','power']],]

解决方法

您正在基于集合进行分组,因此使用集合来检测新组:

def grouper(sequence):
    group,members = [],set()

    for item in sequence:
        if group and members.isdisjoint(item):
            # new group,yield and start new
            yield group
            group,set()
        group.append(item)
        members.update(item)

    yield group

这给出了:

>>> for group in grouper(paths):
...     print group
... 
[['D','C']]
[['E','I']]

或者您可以再次将其强制转换为列表:

output = list(grouper(paths))

这假设这些组是连续的.如果您有不相交的组,则需要处理整个列表并循环遍历为每个项目构建的所有组:

def grouper(sequence):
    result = []  # will hold (members,group) tuples

    for item in sequence:
        for members,group in result:
            if members.intersection(item):  # overlap
                members.update(item)
                group.append(item)
                break
        else:  # no group found,add new
            result.append((set(item),[item]))

    return [group for members,group in result]

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐