微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

从文本列 NLTK

如何解决从文本列 NLTK

我想从数据框文本列中提取名称

该列已经被标记化,它对一个单元格效果很好,但我希望它遍历整个列以最终获得一个在文本列中指示的名称

data['text6']
0     [Walking,After,I,finished,my,class,...
1     [Long,day,Long,but,it,was,not,bad,...
2     [Travelling,work,today,had,a,...
3     [Exam,Day,long,working,w...
4     [lovey,Friday,It,lovely,.,...
5     [Highway,waked,up,early,in,the,mornin...
6     [Work,Quiet,at,found,so,...

pos_tags = nltk.pos_tag(data['text6'][0])
chunks = nltk.ne_chunk(pos_tags,binary=False) #either NE or not NE

for chunk in chunks:
    print(chunk)

entities =[]
labels =[]
for chunk in chunks:
    if hasattr(chunk,'label'):
        #print(chunk)
        entities.append(' '.join(c[0] for c in chunk))
        labels.append(chunk.label())
        
entities_labels = list(set(zip(entities,labels)))
entities_df = pd.DataFrame(entities_labels)
entities_df.columns = ["Entities","Labels"]
entities_df = entities_df[entities_df['Labels']=='PERSON']
entities_df = entities_df.replace({'GPE': 'Person'})
entities_df

[![Output shown Now for only one cell][1]][1]

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。