微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

模糊匹配对应的行 DataFrame pandas

如何解决模糊匹配对应的行 DataFrame pandas

我想模糊匹配 large['name']small['name'] 以创建一个添加'matched_name' 对应于最高匹配行。

大如下

    name
0   24/7 CUSTOMER
1   3 K TECHNOLOGIES
2   3I INFOTECH B P O
3   3I INFOTECH CONSULTANCY SERVICES
4   3I INFOTECH
... ...
889 ZIRCON TECHNOLOGIES
890 ZOETIS
891 ZOHO CORPORATION
892 ZOOM COMMUNICATIONS
893 ZYLOG SYstemS

小是这样的

        name        city        country_code
0       Wetpaint    New York    USA
1       Zoho        Pleasanton  USA
2       Digg        New York    USA
3       Facebook    Menlo Park  USA
4       Accel       Palo Alto   USA
... ... ... ...
1161387 TKX         NaN         NaN
1161388 Digitalhype NaN         NaN
1161389 TK Research NaN         NaN
1161390 Kyodo News  Tokyo       JPN
1161391 TKO Mobile  NaN         NaN

我想要的输出是这样的:

        name        matched_name        city        country_code
0       Wetpaint    WETPAINT CO         New York    USA
1       Zoho        ZOHO CORPORATION    Pleasanton  USA
2       Digg        DIGG CO             New York    USA
3       Facebook    FACEBOOK            Menlo Park  USA
4       Accel       ACCEL               Palo Alto   USA
... ... ... ...
1161387 TKX         TKX CO              NaN         NaN
1161388 Digitalhype DIGITAL HYPE        NaN         NaN
1161389 TK Research TK RESEARCH CO      NaN         NaN
1161390 Kyodo News  KYodo               Tokyo       JPN
1161391 TKO Mobile  TKO CO              NaN         NaN

这就是我目前所拥有的:

# create a new column with match all company_name
for i in large['name']:
    for j in small['company_name']:
        large['matched_name'] = process.extractOne(i,j)

但我收到一个错误ValueError: Length of values (0) does not match length of index (1161392)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。