如何解决如何在标签后获得统一的词？

我正在处理一个数据集，该数据集需要从数据框列的每个句子中提取所有形容词、动词和副词。

这是我正在研究的一个示例，以找出如何获得所需的输出。

list1=['good','excellent','was','not']
for i in list1:
  x=nltk.pos_tag([i])
  #print(x)
  if (x[0][1] == "JJ" or x[0][1] == "JJS" or x[0][1] == "RB" or x[0][1] == "VB" or x[0][1] == "RBR" or x[0][1] == "RBS" or x[0][1] == "VBN" or x[0][1] == "VBP"):
    print(x)

它给我的输出是：

[('good','JJ')]
[('not','RB')]

我需要得到的输出是这样的：

good not

有人可以帮忙吗？

解决方法

您必须更具体地说明要真正提取的内容：

但这是一个尝试。

您似乎正在尝试使用形容词/副词提取动词短语，如果是这样，您可以尝试：

from nltk import pos_tag,word_tokenize
from nltk import ngrams


text = "this is not good."
tagged_text = pos_tag(word_tokenize(text))


focus_tags = set(['JJ','JJS','RB','RBR','RBS','VB','VBN','VBP'])



for (token1,tag1),(token2,tag2) in ngrams(tagged_text,2):
    if tag1 in focus_tags and tag2 in focus_tags:
        print(token1 + ' ' + token2)

但是输出：`is not` 和 `is not good`！！

嗯，在这种情况下，您想要精确的 not good 还是 is not good？

如果是 is not good 三元组，请尝试：

for (token1,tag2),(token3,tag3) in ngrams(tagged_text,3):
    if tag1 in focus_tags and tag2 in focus_tags and tag3 in focus_tags:
        print(token1 + ' ' + token2 + ' ' + token3)

如果我只想要`not good`怎么办？

也许尝试删除动词？例如

from nltk import pos_tag,'RBS'])



for (token1,2):
    if tag1 in focus_tags and tag2 in focus_tags:
        print(token1 + ' ' + token2)

如何在标签后获得统一的词？

如何解决如何在标签后获得统一的词？

解决方法

但是输出：is not 和 is not good！！

如果我只想要not good怎么办？

相关推荐

但是输出：`is not` 和 `is not good`！！

如果我只想要`not good`怎么办？