1.csv
id,noteId,text id2,idNote19,This is my old text 2 id5,idNote13,This is my old text 5 id1,idNote12,This is my old text 1 id3,idNote10,This is my old text 3 id4,idNote11,This is my old text 4
2.csv
id,noteId,text,other id3,idNote10,new text 3,On1 id2,idNote19,My new text 2,Pre8
装载他们像:
>>> df1 = pd.read_csv('1.csv', encoding='utf-8').set_index('id') >>> df2 = pd.read_csv('2.csv', encoding='utf-8').set_index('id') >>> >>> print df1 noteId text id id2 idNote19 This is my old text 2 id5 idNote13 This is my old text 5 id1 idNote12 This is my old text 1 id3 idNote10 This is my old text 3 id4 idNote11 This is my old text 4 >>> print df2 noteId text other id id3 idNote10 new text 3 On1 id2 idNote19 My new text 2 Pre8 id5 NaN My new text 2 Hl0 id22 idNote22 My new text 22 M1
我需要在这样的东西中合并两个DataFrames(df1上的ovewriting值在df2上为空,添加额外的列和df1上不存在的行):
noteId text other id id2 idNote19 My new text 2 Pre8 id5 NaN My new text 2 Hl0 id1 idNote12 This is my old text 1 NaN id3 idNote10 new text 3 On1 id4 idNote11 This is my old text 4 NaN id22 idNote22 My new text 22 M1
我真正的DataFrames还有其他列也应该合并,而不仅仅是文本
我尝试使用合并得到类似的东西:
>>> df1 = pd.read_csv('1.csv', encoding='utf-8') >>> df2 = pd.read_csv('2.csv', encoding='utf-8') >>> >>> print df1 id noteId text 0 id2 idNote19 This is my old text 2 1 id5 idNote13 This is my old text 5 2 id1 idNote12 This is my old text 1 3 id3 idNote10 This is my old text 3 4 id4 idNote11 This is my old text 4 >>> print df2 id noteId text 0 id3 idNote10 new text 3 1 id2 idNote19 My new text 2 >>> >>> print merge(df1, df2, how='left', on=['id']) id noteId_x text_x noteId_y text_y 0 id2 idNote19 This is my old text 2 idNote19 My new text 2 1 id5 idNote13 This is my old text 5 NaN NaN 2 id1 idNote12 This is my old text 1 NaN NaN 3 id3 idNote10 This is my old text 3 idNote10 new text 3 4 id4 idNote11 This is my old text 4 NaN NaN >>>
但这不是我需要的.我不知道我是否在正确的道路上并且应该合并后缀列,或者是否有更好的方法来执行此操作.
谢谢!
更新:
在df1上添加了对df2为空的ovewriting值,在df2上添加了额外的列,这些列应该在“merge”之后存在于df1上,而应该在df1上添加的行
–
解
df1.fillna(value='None', inplace=True) df2.fillna(value='None', inplace=True) concat([df1, df2]).groupby('id').last().fillna(value='None')
在我的情况下,定义默认的“空”值非常重要,这就是fillna的原因.
解决方法:
通常你可以用适当的索引解决这个问题:
df1.set_index(['id', 'noteId'], inplace=True)
df1.update(df2)
(如果你之后不想要那个索引,只需要df1.reset_index(inplace = True))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。