如何解决熊猫:比较来自不同大小的两个不同数据帧的字符串列
我有两个不同大小的数据框,每个都有一列句子,如下:
import pandas as pd
data1 = {'text': ['the old man is here','the young girl is there','the old woman is here','the young boy is there','the young girl is here','the old girl is here']}
df1 = pd.DataFrame (data,columns = ['text'])
和第二个数据框:
data2 = {'text': ['the old man is here','the old girl is there','the young woman is here','the young boy is there']}
df2 = pd.DataFrame (data,columns = ['text'])
如您所见,两个数据框中都有一些相似的句子。我想要的输出是 df1 中的一列,如果两个字符串相似则表示为真,否则为假:
desired output:
text result
'the old man is here' True
'the young girl is there' False
'the old woman is here' False
'the young boy is there' True
'the young girl is here' False
'the old girl is here' False
我试过了:
df1['result'] = np.where(df1['text'].str == df2['text'].str,'True','False')
但是当我检查时,它只返回 false 而没有 'true'
解决方法
如果需要布尔值 True/False
,请使用 Series.isin
:
df1['result'] = df1['text'].isin(df2['text'])
print (df1)
text result
0 the old man is here True
1 the young girl is there False
2 the old woman is here False
3 the young boy is there True
4 the young girl is here False
5 the old girl is here False
工作方式:
#removed '' from 'True','False' for boolean
df1['result'] = np.where(df1['text'].isin(df2['text']),True,False)
您的解决方案创建字符串,因此如果需要用于过滤它会失败:
df1['result'] = np.where(df1['text'].isin(df2['text']),'True','False')
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。