微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

尝试创建循环访问列表中每个项目的for循环,以查找单独列表中的所有匹配项,然后替换匹配项

如何解决尝试创建循环访问列表中每个项目的for循环,以查找单独列表中的所有匹配项,然后替换匹配项

我有两个列表,一个包含禁止的单词,例如:

bad_words = ["Boris","Johnson","coronavirus","daily","cases","BBC"]

一个包含新闻文章文章的每一行都被附加到列表中,如下所示:

news article =  ['Boris Johnson outlined a three-tier system,based on the severity of coronavirus cases in each area.' 'The BBC will report more shortly.','And so on.','And so on.']

我创建了一个for循环,该循环遍历每个被禁止的单词并在新闻文章搜索它们。然后,将单词的每个字符用星号替换。然后将其弹出到另一个名为text_bad_words_removed的列表中。请参阅下面的代码

for line in news_article:
    for word in bad_words:
        if word in line:
            asterisks_to_replace_word_with = '*'*len(word)
            newline_with_asterisks = re.sub(word,asterisks_to_replace_word_with,str(line))
            text_bad_words_removed.append(newline_with_asterisks)

print(text_bad_words_removed)

结果应如下所示:

text_bad_words_removed = ['***** ******* outlined a three-tier system,based on the severity of *********** ***** in each area.','The *** will report more shortly.','And so on','And so on']

但是,它看起来像这样:

text_bad_words_removed = ['***** Johnson outlined a three-tier system,based on the severity of coronavirus cases in each area.',Boris ******* outlined a three-tier system,'Boris Johnson outlined a three-tier system,based on the severity of *********** cases in each area.',based on the severity of coronavirus ***** in each area.','And so on']

问题在于,如果同一行中有多个坏词,如果同一行中有另一个坏词,它将再次将整行复制到列表中。如您在上面看到的。

我该如何解决?我是否可以做到这一点,以便循环在一行中替换所有bad_words,然后将替换了所有不良词的那一行添加到新列表中?

解决方法

您可以预先用不好的词编译正则表达式,然后在列表理解中使用它:

import re


bad_words = ["Boris","Johnson","coronavirus","daily","cases","BBC"]
news_article =  ['Boris Johnson outlined a three-tier system,based on the severity of coronavirus cases in each area.','The BBC will report more shortly.','And so on.','And so on.']

to_replace = re.compile('|'.join(map(re.escape,bad_words)))
new_txt = [to_replace.sub(lambda g: '*' * len(g.group(0)),line) for line in news_article]

# pretty print to screen 
from pprint import pprint
pprint(new_txt)

打印:

['***** ******* outlined a three-tier system,based on the severity of '
 '*********** ***** in each area.','The *** will report more shortly.','And so on.']
,
for line in range(len(news_article)):
    for word in bad_words:
        if word in news_article[line]:
            news_article[line] = news_article[line].replace(word,'*'*len(word))

,

您可以使用单个正则表达式和单个循环来完成所有操作。
像这样的东西。

>>> import re
>>> news_articles =  ['Boris Johnson outlined a three-tier system,'And so on.']
>>>
>>> bad_words = ["Boris","BBC"]
>>> rx = '(?i)(?:{0})'.format('|'.join(bad_words))
>>>
>>> for article in news_articles:
...     articleNew = re.sub(rx,lambda x: '*'*len(x.group()),article)
...     print( articleNew )
...
***** ******* outlined a three-tier system,based on the severity of *********** ***** in each area.
The *** will report more shortly.
And so on.
And so on.

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。