微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

c# – 用于查找文本中所有关键字的高效算法

我有很多字符串包含许多不同拼写的文本.我通过搜索关键字来标记这些字符串,并且如果找到一个关键字,我使用该关键字的assoicated文本.

让我们说搜索字符串可以包含文本“schw”,“schwa”.和“施瓦茨”.我有三个关键字都解决了文本“schwarz”.

现在,我正在寻找一种有效的方式来找到所有的关键字,而不需要对每一个关键字做一个string.Contains(关键字).

样品数据:

H-Fuss ahorn 15 cm/SH48cm
Metall-Fuss chrom 9 cm/SH42cm
Metall-Kufe alufbg.12 cm/SH45c
Metall-Kufe verchr.12 cm/SH45c
Metall-Zylind.aluf.12cm/SH45cm
Kufe alufarbig
Metall-Zylinder hoch alufarbig
Kunststoffgl.schw. - hoch
Kunststoffgl.schw. - Standard
Kunststoffgleiter - schwarz für Sitzhoehe 42 cm

示例关键字(键值):

h-fuss,Holz
ahorn,Ahorn
Metall,Metall
chrom,Chrom
verchr,Chrom
alum,Aluminium
aluf,Aluminium
kufe,Kufe
zylind,Zylinder
hoch,Hoch
kunststoffgl,Gleiter
gleiter,Gleiter
schwarz,Schwarz
schw.,Schwarz

样品结果:

Holz,Ahorn
Metall,Chrom
Metall,Kufe,Aluminium
Metall,Zylinder,Aluminium
Kufe,Hoch,Aluminium
Gleiter,Schwarz,Hoch
Gleiter,Schwarz
Gleiter,Schwarz

解决方法

这似乎适合“ Algorithms using finite set of patterns

The 07001
algorithm is a string searching
algorithm invented by Alfred V. Aho
and Margaret J. Corasick. It is a kind
of dictionary-matching algorithm that
locates elements of a finite set of
strings (the “dictionary”) within an
input text. It matches all patterns
“at once”,so the complexity of the
algorithm is linear in the length of
the patterns plus the length of the
searched text plus the number of
output matches. Note that because all
matches are found,there can be a
quadratic number of matches if every
substring matches (e.g. dictionary =
a,aa,aaa,aaaa and input string is
aaaa).

The 07002 is a string searching algorithm created by Michael O. rabin and Richard M. Karp in 1987 that uses hashing to find any one of a set of pattern strings in a text. For text of length n and p patterns of combined length m,its average and best case running time is O(n+m) in space O(p),but its worst-case time is O(nm). In contrast,the Aho–Corasick string matching algorithm has asymptotic worst-time complexity O(n+m) in space O(m).

原文地址:https://www.jb51.cc/csharp/92900.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐