如何解决如何计算有多少相似的句子?
`User` `Text`
49 there is a cat under the table
21 the sun is hot
431 Could you please close the window?
65 there is a cat under the table
21 the sun is hot
53 there is a cat under the table
我的预期输出是:
Text Freq
there is a cat under the table 3
the sun is hot 2
Could you please close the window? 1
我的方法是使用fuzz.partial_ratio
确定所有句子之间的匹配度(相似度),然后使用groupby计算频率。
我正在使用fuzz.partial_ratio,因此在完全匹配的情况下,它将返回1(100):
check_match =df.apply(lambda row: ((fuzz.partial_ratio(row['Text'],row['Text'])) >= value),axis=1)
其中值是阈值。这是为了确定匹配/相似度
解决方法
您可以使用playerphysicals <- tibble(Wingspan=c("5' 10.5\"","6' 1\""))
playerphysicals
# # A tibble: 2 x 1
# Wingspan
# <chr>
# 1 "5' 10.5\""
# 2 "6' 1\""
out <- playerphysicals %>%
mutate(first = as.numeric(str_extract(Wingspan,"[^\']+")),second = str_extract(Wingspan,'[\\d\\.]+\"$'),second = as.numeric(str_replace(second,"\"",""))/100,Wingspan_num = first + second) %>%
select(-first,-second) %>%
as.data.frame
out
# Wingspan Wingspan_num
# 1 5' 10.5" 5.105
# 2 6' 1" 6.010
value_counts()
,
尝试一下:
df = df.groupby('Text').count()
,
以下方法应该起作用:
from collections import Counter
l=dict(Counter(df.Text))
new_df=pd.DataFrame({'Text':list(d.keys()),'Freq': list(d.values())})
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。