微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

子字符串与字符串中多个单词的高性能匹配 - Python

如何解决子字符串与字符串中多个单词的高性能匹配 - Python

我正在做一个项目,但没有找到任何有用的资源来说明如何将带有多个单词的子字符串与字符串匹配。

例如: substring = "I can be found in this string"string = "Now,I can be found in this string example"

我不能使用 .find() 方法或正则表达式,为了让事情变得更复杂,边缘情况包括

"reflexion mirror" 不匹配 "'reflexion mirror'" 但匹配 "(reflexion mirror)"

"maley" 不匹配 "o'maley"

"luminate" 匹配 "'''luminate"

"luminate" 匹配 "luminate__"

"george" 不匹配 "georges"

每当字符在字符串中加入时,例如 "__hello world__""''hello world''",它都不会干扰匹配 "hello""world"

我正在使用 Boyer Moore 来查找除这些看似相互冲突的边缘情况外有效的子字符串。哦对了,我也忘了提到这个解决方案应该强调时间复杂度的性能

我使用 word.translate({ord(c): None for c in string.whitespace}).lower() 预处理我的字符串和子字符串,结果如下:

"asuggestionBoxentryfrombobcarterdearanonymous,i'mnotquitesureiunderstandtheconceptofthis'anonymous'suggestionBox.ifnoonereadswhatwewrite,thenhowwillanythingeverchangebutinthespiritofgoodwill,i'vedecidedtooffermytwocents,andhopefullykevinwon'tstealit(ha,ha).iwouldreallyliketoseemorevarietiesofcoffeeinthecoffeemachineinthebreakroom.'milkandsugar','blackwithsugar','extrasugar'and'creamandsugar'don'toffermuchdiversity.also,theselectionofdrinksseemsheavilyweightedinfavorof'sugar'.whatifwedon'twantanysugar?"

关于如何解释这些边缘情况的任何想法??

谢谢

编辑

一个警告,' 将被视为一个字符

这是我收集边缘案例的单元测试:

class TestCountoccurrencesInText(unittest.TestCase):
    def test_count_occurrences_in_text(self):
        """
        Test the count_occurrences_in_text function
        """
        text = """Georges is my name and I like python. Oh ! your name is georges? And you like Python!
Yes is is true,I like PYTHON
and my name is GEORGES"""
        # test with a little text.
        self.assertEqual(3,count_occurrences_in_text("Georges",text))
        self.assertEqual(3,count_occurrences_in_text("GEORGES",count_occurrences_in_text("georges",text))
        self.assertEqual(0,count_occurrences_in_text("george",count_occurrences_in_text("python",count_occurrences_in_text("PYTHON",text))
        self.assertEqual(2,count_occurrences_in_text("I",count_occurrences_in_text("n",count_occurrences_in_text("reflexion mirror","I am a senior citizen and I live in the Fun-Plex 'Reflexion Mirror' in Sopchoppy,Florida"))
        self.assertEqual(1,count_occurrences_in_text("Linguist","'''Linguist Specialist Found Dead on Laboratory Floor'''"))

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。