如何解决将最接近的字符串与另一个字符串足球队匹配?
我正在努力标准化我通过Football API接收到的一些数据。
我有一个具有三个输入的函数,home
,away
(两个足球队)和包含球队home
或away
的字符串列表,但是它们可以与输入home
和away
命名不同。
我的目标是将列表中home
的所有实例替换为1,并将列表中away
的所有实例替换为2。
以下是一些示例输入:
home: "Manchester United",away: "Liverpool",list = ["Man Utd and Yes","Liverpool and No","Man Utd and No","Liverpool and Yes"]
home: "Manchester United",away: "Manchester City","Man City and No","Man City and Yes"]
home: "Paris Saint Germain",away: "Monaco",list = ["Monaco and Yes","Monaco and No","PSG and Yes","PSG and No"]
home: "Brighton & Hove Albion",away: "Chelsea",list = ["Chelsea and No","Brighton and Yes","Chelsea and Yes","Brighton and No"]
请注意,列表中的球队名称是一致的(您永远不会在同一列表中看到“曼联和是”,“曼联和否”)。
现在,我该如何匹配球队?这是我到目前为止所做的:
def standardise(home,away,lst):
for i,v in enumerate(lst):
team = v.split("and")[0]
if team in home or home in team:
lst[i] = v.replace(team,"1")
for j,k in enumerate(lst):
new_team = k.split("and")[0]
if new_team != i and team != new_team:
lst[j] = k.replace(new_team,"2")
else:
lst[j] = k.replace(new_team,"1")
elif team in away or away in team:
# same code as above but for away
elif enchant.utils.levenshtein(team,home) >= \
enchant.utils.levenshtein(team,away):
lst[i] = v.replace(team,"2")
else:
lst[i] = v.replace(team,"1")
Levenshtein距离用于衡量将一个单词序列转换为另一个单词序列所需的最少编辑次数。
现在,此方法无法100%地起作用,例如,使用首字母缩写词时,该方法似乎失败了。
是否有更好的方法可以做到这一点,也许有人可以想到一种更具体的方法?
解决方法
Fuzzywuzzy非常适合此操作。也很docs
from fuzzywuzzy import process
def standardise(home,away,lst):
home_away = {home:'1',away:'2'}
choices = [home,away]
print ([ home_away[process.extractOne(each,choices)[0]] for each in lst ])
home = "Manchester United"
away = "Liverpool",lst = ["Man Utd and Yes","Liverpool and No","Man Utd and No","Liverpool and Yes"]
standardise(home,lst)
home = "Manchester United"
away = "Manchester City"
lst = ["Man Utd and Yes","Man City and No","Man City and Yes"]
standardise(home,lst)
home = "Paris Saint Germain"
away = "Monaco"
lst = ["Monaco and Yes","Monaco and No","PSG and Yes","PSG and No"]
standardise(home,lst)
home = "Brighton & Hove Albion"
away = "Chelsea"
lst = ["Chelsea and No","Brighton and Yes","Chelsea and Yes","Brighton and No"]
standardise(home,lst)
输出:
['1','2','1','2']
['1','2']
['2','1']
['2','1']
,
您可以先尝试删除元音,以便每个名称都更接近其首字母缩写,然后再应用Levenshtein。
还要检查fuzzywuzzy。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。