如何解决无法将表情符号从社交媒体评论中转换为文本情感
我从 facebook 和 twitter 对产品广告的评论中收集了数据,并尝试对这些评论进行情绪分析。部分文本清理涉及将表情符号转换为文本情感,以最大限度地捕获评论中的所有情感。我已经尝试过 emoji.demojize(text) 每行和来自 stackoverflow 的各种其他方法,但没有一个将评论中的表情符号转换为文字中的实际情绪。下面的代码不起作用。不知道我的错误是什么。代码如下:
enter import io
import json
def handleEmojis(text,keep_emoticons = False):
global emoji_sentiment_matching
if not 'emoji_sentiment_matching' in globals():
with io.open('emoji.json','r',encoding = "UTF-8") as outfile:
emoji_sentiment_matching = json.load(outfile)
HASHTAG_PATTERN = re.compile(r'#\w*')
EMOJIS_PATTERN_PLAIN_TEXT = re.compile(r"(?:X|:|;|=)(?:-)?(?:\)|\(|O|D|P|S){1,}",re.IGnorECASE)
EMOJIS_PATTERN_SYMBOLS = re.compile(u'[\U00002600-\U000027BF]|[\U0001f300-\U0001f64F]|[\U0001f680-\U0001f6FF]')
if keep_emoticons:
# Replace emoji with sentiment
for emoji in emoji_sentiment_matching:
if emoji["emoji"] in text:
## Adding space if text follows right away / is right before the emoticon
idx = text.find(emoji["emoji"])
(space1,space2) = ("","")
if (idx-1) >= 0 and text[idx-1] != " ":
space1 = " "
if (idx+1) <= len(text) and text[idx+1] != " ":
space2 = " "
## replace emoticon with its sentiment
text = text.replace(emoji["emoji"],"{}emoji%%{}{}".format(space1,emoji["subgroup"],space2))}
## TO IMPLEMENT: Sentiment of other emoticons like :),:-),:-/
else:
for r in re.findall(EMOJIS_PATTERN_SYMBOLS,text):
text = text.replace(r,"")
for r in re.findall(EMOJIS_PATTERN_PLAIN_TEXT,"")
return text.strip()
import io
import json
FB_df['demojified']=FB_df['Text']
for i in range(len(FB_df)):
text = FB_df.loc[i,"demojified"]
handleEmojis(text,keep_emoticons = False)
print(FB_df)
这是结果输出(请参阅“demojified”列): dataframe outputs
我也尝试了以下代码:
import re
from emot.emo_unicode import UNICODE_EMO,EMOTICONS
from emoji import demojize
def convert_emojis(text):
for emot in UNICODE_EMO:
text = re.sub(r'('+emot+')',"_".join(UNICODE_EMO[emot].replace(",","").replace(":","").split()),text)
return text
将表情符号转换为文字
def convert_emoticons(text):
for emot in EMOTICONS:
text = re.sub(u'('+emot+')',"_".join(EMOTICONS[emot].replace(",text)
return text
FB_df['demojified']=FB_df['Text']
for row in FB_df['demojified']:
for text in row:
text=text
convert_emojis(text)
FB_df.loc[:,'demojified']
仍然没有快乐。我已经在这一个星期了。请提供一些指导,将不胜感激
我也试过:
I have also tried:
import re
from emot.emo_unicode import UNICODE_EMO,text)
return text
FB_df['demojified']=FB_df['Text']
for row in FB_df['demojified']:
for text in row:
text=str(text)
text = emoji.demojize(text)
仍然没有快乐:-(
解决方法
发现问题。我忘记用for循环中的输出更新demojified列
FB_df['Demojified']=FB_df['Comments']
for i in range(len(FB_df.Demojified)):
text = FB_df.loc[i,"Demojified"]
text=emoji.demojize(text)
text = text.replace(":"," ")
text = ' '.join(text.split())
FB_df.loc[i,"Demojified"] = text
FB_df=FB_df[['Title','Comments','Demojified']]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。