如何解决模糊匹配对应的行 DataFrame pandas
我想模糊匹配 large['name']
和 small['name']
以创建一个添加列 'matched_name'
对应于最高匹配行。
大如下
name
0 24/7 CUSTOMER
1 3 K TECHNOLOGIES
2 3I INFOTECH B P O
3 3I INFOTECH CONSULTANCY SERVICES
4 3I INFOTECH
... ...
889 ZIRCON TECHNOLOGIES
890 ZOETIS
891 ZOHO CORPORATION
892 ZOOM COMMUNICATIONS
893 ZYLOG SYstemS
小是这样的
name city country_code
0 Wetpaint New York USA
1 Zoho Pleasanton USA
2 Digg New York USA
3 Facebook Menlo Park USA
4 Accel Palo Alto USA
... ... ... ...
1161387 TKX NaN NaN
1161388 Digitalhype NaN NaN
1161389 TK Research NaN NaN
1161390 Kyodo News Tokyo JPN
1161391 TKO Mobile NaN NaN
我想要的输出是这样的:
name matched_name city country_code
0 Wetpaint WETPAINT CO New York USA
1 Zoho ZOHO CORPORATION Pleasanton USA
2 Digg DIGG CO New York USA
3 Facebook FACEBOOK Menlo Park USA
4 Accel ACCEL Palo Alto USA
... ... ... ...
1161387 TKX TKX CO NaN NaN
1161388 Digitalhype DIGITAL HYPE NaN NaN
1161389 TK Research TK RESEARCH CO NaN NaN
1161390 Kyodo News KYodo Tokyo JPN
1161391 TKO Mobile TKO CO NaN NaN
这就是我目前所拥有的:
# create a new column with match all company_name
for i in large['name']:
for j in small['company_name']:
large['matched_name'] = process.extractOne(i,j)
但我收到一个值错误:ValueError: Length of values (0) does not match length of index (1161392)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。