正则表达式匹配双引号内的每个字符串并包含转义引号

如何解决正则表达式匹配双引号内的每个字符串并包含转义引号

已经有很多类似的问题，但没有一个适用于我的情况。我有一个包含双引号内的多个子字符串的字符串，这些子字符串可以包含转义的双引号。

例如对于字符串'然后，“这是一些带有引号和\”转义引号\“里面的示例文本”。不是我们需要更多，而是......“这是\”另一个\“一个”。以防万一。'，预期的结果是一个包含两个元素的数组；

"this is some sample text with quotes and \"escaped quotes\" inside"
"here is \"another\" one"

/"(?:\\"|[^"])*"/g 正则表达式在 regex101 上按预期工作；但是，当我使用 String#match() 时，结果是不同的。看看下面的片段：

let str = 'And then,"this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more,but... "here is \"another\" one". Just in case.'
let regex = /"(?:\\"|[^"])*"/g

console.log(str.match(regex))

我得到了四个，而不是两个匹配项，甚至不包括转义引号内的文本。

MDN mentions 如果使用 g 标志，将返回所有匹配完整正则表达式的结果，但不会返回捕获组。如果我想获取捕获组并且设置了全局标志，我需要使用RegExp.exec()。我试过了，结果是一样的：

let str = 'And then,but... "here is \"another\" one". Just in case.'
let regex = /"(?:\\"|[^"])*"/g
let temp
let matches = []

while (temp = regex.exec(str))
  matches.push(temp[0])

console.log(matches)

我怎样才能得到一个包含这两个匹配元素的数组？

解决方法

regex 不能按预期工作的原因是因为单个反斜杠是转义字符。您需要转义文本中的反斜杠：

let str = 'And then,"this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more,but... "here is \"another\" one". Just in case.';
let regex = /"(?:\\"|[^"])*"/g

console.log(str);
console.log(str.match(regex))

str = 'And then,"this is some sample text with quotes and \\"escaped quotes\\" inside". Not that we need more,but... "here is \\"another\\" one". Just in case.';

console.log(str);
console.log(str.match(regex))

另一种选择是没有 | 运算符的更优化的正则表达式：

const str = String.raw`And then,but... "here is \"another\" one". Just in case.`
const regex = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g
console.log(str.match(regex))

使用 String.raw，无需两次转义引号。

见regex proof。顺便说一句，28 steps 与 267 steps。

说明

--------------------------------------------------------------------------------
  "                        '"'
--------------------------------------------------------------------------------
  [^"\\]*                  any character except: '"','\\' (0 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group,but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \\                       '\'
--------------------------------------------------------------------------------
    [\s\S]                   any character of: whitespace (\n,\r,\t,\f,and " "),non-whitespace (all
                             but \n,and " ")
--------------------------------------------------------------------------------
    [^"\\]*                  any character except: '"','\\' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  "                        '"'

正则表达式匹配双引号内的每个字符串并包含转义引号

如何解决正则表达式匹配双引号内的每个字符串并包含转义引号

解决方法

相关推荐