我有一个列表列表,我试图根据他们的项目对它们进行分组或聚类.如果上一组中没有元素,嵌套列表将启动一个新组.
输入:
paths = [ ['D','B','A','H'],['D','C'],['H',['E','G','I'],['F','I']]
我失败的代码:
paths = [ ['D','I'] ] groups = [] paths_clone = paths for path in paths: for node in path: for path_clone in paths_clone: if node in path_clone: if not path == path_clone: groups.append([path,path_clone]) else: groups.append(path) print groups
预期产出:
[ [ ['D','C'] ],[ ['E','I'] ] ]
另一个例子:
paths = [['shifter','barrel','barrel shifter'],['ARM',['IP power','IP','power'],'shifter']]
预期产出组:
output = [ [['shifter','shifter']],[['IP power','power']],]
解决方法
您正在基于集合进行分组,因此使用集合来检测新组:
def grouper(sequence): group,members = [],set() for item in sequence: if group and members.isdisjoint(item): # new group,yield and start new yield group group,set() group.append(item) members.update(item) yield group
这给出了:
>>> for group in grouper(paths): ... print group ... [['D','C']] [['E','I']]
或者您可以再次将其强制转换为列表:
output = list(grouper(paths))
这假设这些组是连续的.如果您有不相交的组,则需要处理整个列表并循环遍历为每个项目构建的所有组:
def grouper(sequence): result = [] # will hold (members,group) tuples for item in sequence: for members,group in result: if members.intersection(item): # overlap members.update(item) group.append(item) break else: # no group found,add new result.append((set(item),[item])) return [group for members,group in result]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。