微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

正则表达式 – 如何在Node / V8中实现正则表达式匹配?

我遇到过 an article,它表明正则表达式匹配通常是使用潜在表现不佳的算法而不是建议的Thompson NFA算法实现的.

考虑到这一点,如何在Node或V8中实现?是否有可能使用Thompson NFA的JS实现来提高性能,可能只使用了有限的一部分功能(可能删除了前瞻或其他“高级”功能)?

解决方法

正如Chrome的开发团队在 announcement中所提到的,V8引擎使用 Irregexp正则表达式引擎:

以下是有关此引擎实现的一些引用:

A fundamental decision we made early in the design of Irregexp was
that we would be willing to spend extra time compiling a regular
expression if that would make running it faster. During compilation
Irregexp first converts a regexp into an intermediate automaton
representation. This is in many ways the “natural” and most accessible
representation and makes it much easier to analyze and optimize the
regexp. For instance,when compiling /Sun|Mon/ the automaton
representation lets us recognize that both alternatives have an ‘n’ as
their third character. We can quickly scan the input until we find an
‘n’ and then start to match the regexp two characters earlier.
Irregexp looks up to four characters ahead and matches up to four
characters at a time.

After optimization we generate native machine code which uses
backtracking to try different alternatives. Backtracking can be
time-consuming so we use optimizations to avoid as much of it as we
can. There are techniques to avoid backtracking altogether but the
nature of regexps in JavaScript makes it difficult to apply them in
our case,though it is something we may implement in the future.

因此V8会编译为本机自动机表示 – 尽管它不使用Thompson NFA.

至于性能,this article将V8正则表达式性能与其他库/语言进行了比较.

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐