如何解决如何比较2个不同的CSV文件并输出差异
我有2个CSV文件,分别是New.csv和Old.csv,它们具有大约1K行和10列,其结构如下:
如果new.csv中有一个longName(第一列)而不是old.csv中的longName,我希望将new.csv的整个行都附加到changes.csv。
我首先是这样做的,但是一点都不奏效:
def deltaFileMaker():
with open('Old.csv','r',encoding='utf-8') as t1,open('New.csv',encoding='utf-8') as t2:
fileone = t1.readlines()
filetwo = t2.readlines()
with open('changes.csv','w',encoding='utf-8') as outFile:
for line in filetwo:
if line not in fileone:
outFile.write(line)
deltaFileMaker()
我也尝试使用csv-diff,但找不到将其输出转换为csv文件的方法
更新
def deltaFileMaker():
from csv_diff import load_csv,compare
diff = compare(
load_csv(open("old.csv",encoding="utf8"),key="longName"),load_csv(open("new.csv",key="longName")
)
with open('changes.csv',encoding="utf8") as f:
w = csv.DictWriter(f,diff.keys())
w.writeheader()
w.writerow(diff)
deltaFileMaker()
解决方法
您看过csv-diff
吗?他们的website举了一个可能合适的例子:
from csv_diff import load_csv,compare
diff = compare(
load_csv(open("one.csv"),key="id"),load_csv(open("two.csv"),key="id")
)
这应该返回一个dict
对象,您可以将其解析为CSV文件。要将那个字典解析为行,这是一个示例。 注意:要使更改正确地写入是很困难的,但这更多是一种概念验证-如您所愿进行修改
from csv_diff import load_csv,compare
fro csv import DictWriter
# Get all the row headers across all the changes
headers = set({'change type'})
for key,vals in diff.items():
for val in vals: # Multiple of the same difference 'type'
headers = headers.union(set(val.keys()))
# Write changes to file
with open('changes.csv','w',encoding='utf-8') as fh:
w = DictWriter(fh,headers)
w.writeheader()
for key,changes in diff.items():
for val in changes: # Add each instance of this type of change
val.update({'change type': key}) # Add 'change type' data
w.writerow(val)
对于文件one.csv
:
id,name,age
1,Cleo,4
2,Pancakes,2
和two.csv
:
id,5
3,Bailey,1
4,Elliot,10
运行此操作会产生:
change type,id,changes,age,key
added,3,1,added,4,10,removed,2,changed,"{'age': ['4','5']}",1
因此,对于所有更改而言都不是一件好事,但对于添加/删除的行确实非常有效。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。