尝试创建循环访问列表中每个项目的for循环，以查找单独列表中的所有匹配项，然后替换匹配项

如何解决尝试创建循环访问列表中每个项目的for循环，以查找单独列表中的所有匹配项，然后替换匹配项

我有两个列表，一个包含禁止的单词，例如：

bad_words = ["Boris","Johnson","coronavirus","daily","cases","BBC"]

另一个包含新闻文章的文章的每一行都被附加到列表中，如下所示：

news article =  ['Boris Johnson outlined a three-tier system,based on the severity of coronavirus cases in each area.' 'The BBC will report more shortly.','And so on.','And so on.']

我创建了一个for循环，该循环遍历每个被禁止的单词并在新闻文章中搜索它们。然后，将单词的每个字符用星号替换。然后将其弹出到另一个名为text_bad_words_removed的列表中。请参阅下面的代码：

for line in news_article:
    for word in bad_words:
        if word in line:
            asterisks_to_replace_word_with = '*'*len(word)
            newline_with_asterisks = re.sub(word,asterisks_to_replace_word_with,str(line))
            text_bad_words_removed.append(newline_with_asterisks)

print(text_bad_words_removed)

结果应如下所示：

text_bad_words_removed = ['***** ******* outlined a three-tier system,based on the severity of *********** ***** in each area.','The *** will report more shortly.','And so on','And so on']

但是，它看起来像这样：

text_bad_words_removed = ['***** Johnson outlined a three-tier system,based on the severity of coronavirus cases in each area.',Boris ******* outlined a three-tier system,'Boris Johnson outlined a three-tier system,based on the severity of *********** cases in each area.',based on the severity of coronavirus ***** in each area.','And so on']

问题在于，如果同一行中有多个坏词，如果同一行中有另一个坏词，它将再次将整行复制到列表中。如您在上面看到的。

我该如何解决？我是否可以做到这一点，以便循环在一行中替换所有bad_words，然后将替换了所有不良词的那一行添加到新列表中？

解决方法

您可以预先用不好的词编译正则表达式，然后在列表理解中使用它：

import re


bad_words = ["Boris","Johnson","coronavirus","daily","cases","BBC"]
news_article =  ['Boris Johnson outlined a three-tier system,based on the severity of coronavirus cases in each area.','The BBC will report more shortly.','And so on.','And so on.']

to_replace = re.compile('|'.join(map(re.escape,bad_words)))
new_txt = [to_replace.sub(lambda g: '*' * len(g.group(0)),line) for line in news_article]

# pretty print to screen 
from pprint import pprint
pprint(new_txt)

打印：

['***** ******* outlined a three-tier system,based on the severity of '
 '*********** ***** in each area.','The *** will report more shortly.','And so on.']

for line in range(len(news_article)):
    for word in bad_words:
        if word in news_article[line]:
            news_article[line] = news_article[line].replace(word,'*'*len(word))

您可以使用单个正则表达式和单个循环来完成所有操作。
像这样的东西。

>>> import re
>>> news_articles =  ['Boris Johnson outlined a three-tier system,'And so on.']
>>>
>>> bad_words = ["Boris","BBC"]
>>> rx = '(?i)(?:{0})'.format('|'.join(bad_words))
>>>
>>> for article in news_articles:
...     articleNew = re.sub(rx,lambda x: '*'*len(x.group()),article)
...     print( articleNew )
...
***** ******* outlined a three-tier system,based on the severity of *********** ***** in each area.
The *** will report more shortly.
And so on.
And so on.