将文本中的股票行情与股票行情列表匹配而不匹配停用词

如何解决将文本中的股票行情与股票行情列表匹配而不匹配停用词

我有一个包含大约 28,000 个股票代码的 Python 列表。

我正在解析我抓取的文本以与股票行情匹配，并在匹配时增加计数。

我遇到的问题是所有停用词都与一些我不想要的代码相匹配；例如，V 是一个合法的股票代码，并且与单个标记词匹配，因为它是自由流动的社交媒体文本。例如 V 想要 TSLA。

你能给我建议一些逻辑，我可以使用这些停用词应用一些逻辑智能匹配吗？

counts = dict()
Symbol_list =['TSLA','V','T','AAPL',...]

example sentence = { 'V want TSLA but not. T + 5 times' }

这是我迄今为止尝试过的：

sen = example_sentence.translate(str.maketrans('','',string.punctuation))

sentence_words = sen.split()
for words in sentence_words:
    if(word in symbol_list):
        counts[word] = counts.get(word,0) + 1

我想要{'TSLA':1}，但不想{'TSLA':1,'V':1,'T': 1}。在某些情况下，我可能需要将 T 和 V 添加到字典中，但需要根据上下文。

解决方法

import string
counts = dict()
example_sentence = { 'V want TSLA but not. T + 5 times' }
counts = dict()
symbol_list =['TSLA','V','T','AAPL',...]
example_sentence = 'V want TSLA but not. T + 5 times'
sen = example_sentence.translate(str.maketrans('','',string.punctuation))
sentence_words = sen.split()
counts_list = []
for word in sentence_words:
    if(word in symbol_list):
        counts[word] = counts.get(word,0) + 1
        counts_list.append({word:counts[word]})

现在你有一个字典列表作为输出：计数列表[1] {'TSLA'：1}