如何解决附加模糊匹配的熊猫数据框
我想将 small['company_name']
中的数据模糊匹配到 large['name']
以最终将其附加到 dico,用于以下输出
0 uuid name company_name type primary_role cb_url domain homepage_url combined_stock_symbols city region country_code short_description
0 e1393508-30ea-8a36-3f96-dd3226033abd Wetpaint organization company https://www.crunchbase.com/organization/wetpai... wetpaint.com http://www.wetpaint.com/ NaN New York New York USA Wetpaint offers an online social publishing pl...
1 bf4d7b0e-b34d-2fd8-d292-6049c4f7efc7 Zoho organization company https://www.crunchbase.com/organization/zoho?u... zoho.com https://www.zoho.com/ NaN Pleasanton California USA Zoho offers a suite of business,collaboration...
2 5f2b40b8-d1b3-d323-d81a-b7a8e89553d0 Digg organization company https://www.crunchbase.com/organization/digg?u... digg.com http://www.digg.com NaN New York New York USA Digg Inc. operates a website that enables its ...
3 df662812-7f97-0b43-9d3e-12f64f504fbb Facebook organization company https://www.crunchbase.com/organization/facebo... facebook.com http://www.facebook.com nasdaq:FB Menlo Park California USA Facebook is an online social networking servic...
4 b08efc27-da40-505a-6f9d-c9e14247bf36 Accel organization investor https://www.crunchbase.com/organization/accel?... accel.com http://www.accel.com NaN Palo Alto California USA Accel is an early and growth-stage venture cap...
大如下
uuid name type primary_role cb_url domain homepage_url combined_stock_symbols city region country_code short_description
0 e1393508-30ea-8a36-3f96-dd3226033abd Wetpaint organization company https://www.crunchbase.com/organization/wetpai... wetpaint.com http://www.wetpaint.com/ NaN New York New York USA Wetpaint offers an online social publishing pl...
1 bf4d7b0e-b34d-2fd8-d292-6049c4f7efc7 Zoho organization company https://www.crunchbase.com/organization/zoho?u... zoho.com https://www.zoho.com/ NaN Pleasanton California USA Zoho offers a suite of business,collaboration...
2 5f2b40b8-d1b3-d323-d81a-b7a8e89553d0 Digg organization company https://www.crunchbase.com/organization/digg?u... digg.com http://www.digg.com NaN New York New York USA Digg Inc. operates a website that enables its ...
3 df662812-7f97-0b43-9d3e-12f64f504fbb Facebook organization company https://www.crunchbase.com/organization/facebo... facebook.com http://www.facebook.com nasdaq:FB Menlo Park California USA Facebook is an online social networking servic...
4 b08efc27-da40-505a-6f9d-c9e14247bf36 Accel organization investor https://www.crunchbase.com/organization/accel?... accel.com http://www.accel.com NaN Palo Alto California USA Accel is an early and growth-stage venture cap...
和小
company_name
0 24/7 CUSTOMER
1 3 K TECHNOLOGIES
2 3I INFOTECH B P O
3 3I INFOTECH CONSULTANCY SERVICES
4 3I INFOTECH
from fuzzywuzzy import fuzz
comb = pd.MultiIndex.from_product((large['name'],small['company_name']))
scores = comb.map(lambda x: fuzz.ratio(*x)) #or fuzz.partial_ratio(*x)
d = dict(a for a,b in zip(comb,scores) if b>90) #change threshold
out = large.assign(SurName=large['name'].map(d)).dropna(subset=['SurName'])
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。