我正在尝试创建一个正则表达式来捕获文本引用.
以下是文本引用的几个例句:
… and the reported results in (Nivre et al.,2007) were not representative …
… two systems used a Markov chain approach (Sagae and Tsujii 2007).
Nivre (2007) showed that …
… for attaching and labeling dependencies (Chen et al.,2007; Dredze et al.,2007).
目前,我的正则表达式是
\(\D*\d\d\d\d\)
哪个匹配示例1-3,但不匹配示例4.如何修改此示例以捕获示例4?
谢谢!
我最近为此目的使用了这样的东西:
#!/usr/bin/env perl use 5.010; use utf8; use strict; use autodie; use warnings qw< FATAL all >; use open qw< :std IO :utf8 >; my $citation_rx = qr{ \( (?: \s* # optional author list (?: # has to start capitalized \p{Uppercase_Letter} # then have a lower case letter,or maybe an apostrophe (?= [\p{Lowercase_Letter}\p{Quotation_Mark}] ) # before a run of letters and admissible punctuation [\p{Alphabetic}\p{Dash_Punctuation}\p{Quotation_Mark}\s,.] + ) ? # hook if and only if you want the authors to be optional!! # a reasonable year \b (18|19|20) \d\d # citation series suffix,up to a six-parter [a-f] ? \b # trailing semicolon to separate multiple citations ; ? \s* ) + \) }x; while (<DATA>) { while (/$citation_rx/gp) { say ${^MATCH}; } } __END__ ... and the reported results in (Nivré et al.,2007) were not representative ... ... two systems used a Markov chain approach (Sagae and Tsujii 2007). Nivre (2007) showed that ... ... for attaching and labelling dependencies (Chen et al.,2007; Dredze et al.,2007).
运行时,它会产生:
(Nivré et al.,2007) (Sagae and Tsujii 2007) (2007) (Chen et al.,2007)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。