如果两边都有东西，比较结果

如何解决如果两边都有东西，比较结果

有两个字符串：

machine1 665600MB 512512MB 19%                    
machine2 53248MB  41000MB  20%  
machine3 625600MB 522512MB 22%

和：

machine1 665600MB 512512MB 21%                    
machine2 53248MB  41000MB  22%  
machine3 625600MB 522512MB 21%
machine5 53248MB  41000MB  23%

我想比较两者的差异，但只针对那些两边都一样的机器（machine1,2,3），避免machine5（那一定是对两边的，如果一个东西存在的话，但是不在另一个中，必须忽略）。

为了比较两个字符串，我使用了这个：

avoid = {x.rstrip() for x in string2.splitlines()}
result = str("\n".join(x for x in string1.splitlines() if x.rstrip() not in avoid))

但它只比较了一侧的所有差异...

解决方法

我的想法是使用正则表达式来识别每个字符串中的机器及其交集：

import re

string1 = '''machine1 665600MB 512512MB 19%
machine2 53248MB  41000MB  20%
machine3 625600MB 522512MB 22%'''

string2 = '''machine1 665600MB 512512MB 21%
machine2 53248MB  41000MB  22%
machine3 625600MB 522512MB 21%
machine5 53248MB  41000MB  23%'''

pat = 'machine\d+'

machines1 = re.findall(pat,string1)
machines2 = re.findall(pat,string2)
intersect = set(machines1) & set(machines2)
# {'machine1','machine2','machine3'}

然后基于该交集的子集，使用与上面相同的拆分和连接：

newstring1 = '\n'.join(line for line in string1.splitlines() if
                       re.search(pat,line).group() in intersect)
newstring2 = '\n'.join(line for line in string2.splitlines() if
                       re.search(pat,line).group() in intersect)

结果是这两个新字符串：

>>> print(newstring1)
machine1 665600MB 512512MB 19%
machine2 53248MB  41000MB  20%
machine3 625600MB 522512MB 22%

>>> print(newstring2)
machine1 665600MB 512512MB 21%
machine2 53248MB  41000MB  22%
machine3 625600MB 522512MB 21%

你想如何“比较”它们有点模糊，但两个新字符串应该只包含相同机器的记录。