如何解决Textblob和情感分析:如何优化词典?
许多人使用文本斑点对文本进行情感分析。我确信我在理解该方法及其使用方法时会漏掉一些东西,但是有些东西根本无法从我的分析结果中得出结论。
这是我拥有的数据的示例:
Top Text label sentiment polarity
51 CVD-Grown Carbon Nanotube Branches on Black Si... silicon-carbon nanotube (bSi-CNT) hybrid struc... -1 (-0.16666666666666666,0.43333333333333335) -0.166667
69 Navy postpones its largest-ever Milan exercise... Navy on Tuesday postponed a multi-nation mega ... -1 (-0.125,0.375) -0.125000
81 Malaysia rings alarm bell on fake Covid... The United Nations International Children's Em... -1 (-0.5,1.0) -0.500000
82 Poison Not Transmitted By Air... it falls on the fabric remains 9 hours,so was... -1 (-0.2,0.0) -0.200000
87 A WhatsApp rumor is spreading that is allegedl... strict about unsourced speculation than other ... -1 (-0.1,0.1) -0.100000
90 Dumb Whatsapp Forwards - Page 2 - Cricket Web as the ones that say like or share this pictur... -1 (-0.375,0.5) -0.375000
144 malaysia | Unicef Malaysia rings alarm b... such messages claiming to be from us,” #Milan... -1 (-0.5,1.0) -0.500000
134 False and unverified claims are being... Soccer was not issued by the U... -1 (-0.4000000000000001,0.6) -0.400000
123 Truth behind the Viral message about Co... number of stories ever since the wave of misin... -1 (-0.4,0.7) -0.400000
166 In India,Fake WhatsApp Forwards on Coronaviru... of confirmed cases of rises rapidl... -1 (-0.5,1.0) -0.500000
我使用了以下算法:
df['sentiment'] = df['Top'].apply(lambda Tweet: TextBlob(Tweet).sentiment)
df1=pd.DataFrame(df['sentiment'].tolist(),index= df.index)
df_new = df
df_new['polarity'] = df1['polarity']
df_new.polarity = df1.polarity.astype(float)
df_new['subjectivity'] = df1['subjectivity']
df_new.subjectivity = df1.polarity.astype(float)
# print(df_new)
conditionList = [
df_new['polarity'] == 0,df_new['polarity'] > 0,df_new['polarity'] < 0]
choiceList = ['neutral','not_fake','fake']
df_new['label'] = np.select(conditionList,choiceList,default='no_label')
但是您可以看到所有这些消息均来自事实检查来源,因此它们不是伪造的。 如何改善结果,也许删除一些特定的单词? 我可以看到,如果文本包含虚假,未经验证,病毒式,假冒,则将其标记为否定,这会使结果更糟。
解决方法
您所有的文本均为负极性,因此根据您的代码,它们被标记为假。
没有指示如何确定极性字段,它是在源文件中预先计算的。如果使用的是textblob默认极性算法,则将针对哪个文本运行?
(也可能有错字。Df_new.subjectivity被指定了极性的浮点转换)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。