如何解决使用 nom 解析带有反斜杠转义单引号的单引号字符串
这是 Parsing single-quoted string with escaped quotes with Nom 5 和 Parse string with escaped single quotes 的变体。我想将像 '1 \' 2 \ 3 \\ 4'
(原始字符序列)这样的字符串解析为 "1 \\' 2 \\ 3 \\\\ 4"
(一个 Rust 字符串),所以除了可能有 \'
之外,我不关心任何转义字符串里面。尝试使用链接问题中的代码:
use nom::{
branch::alt,bytes::complete::{escaped,tag},character::complete::none_of,combinator::recognize,multi::{many0,separated_list0},sequence::delimited,IResult,};
fn parse_quoted_1(input: &str) -> IResult<&str,&str> {
delimited(
tag("'"),alt((escaped(none_of("\\\'"),'\\',tag("'")),tag(""))),tag("'"),)(input)
}
fn parse_quoted_2(input: &str) -> IResult<&str,recognize(separated_list0(tag("\\'"),many0(none_of("'")))),)(input)
}
fn main() {
println!("{:?}",parse_quoted_1(r#"'1'"#));
println!("{:?}",parse_quoted_2(r#"'1'"#));
println!("{:?}",parse_quoted_1(r#"'1 \' 2'"#));
println!("{:?}",parse_quoted_2(r#"'1 \' 2'"#));
println!("{:?}",parse_quoted_1(r#"'1 \' 2 \ 3'"#));
println!("{:?}",parse_quoted_2(r#"'1 \' 2 \ 3'"#));
println!("{:?}",parse_quoted_1(r#"'1 \' 2 \ 3 \\ 4'"#));
println!("{:?}",parse_quoted_2(r#"'1 \' 2 \ 3 \\ 4'"#));
}
/*
Ok(("","1"))
Ok(("","1 \\' 2"))
Ok((" 2'","1 \\"))
Err(Error(Error { input: "1 \\' 2 \\ 3'",code: Tag }))
Ok((" 2 \\ 3'","1 \\"))
Err(Error(Error { input: "1 \\' 2 \\ 3 \\\\ 4'",code: Tag }))
Ok((" 2 \\ 3 \\\\ 4'","1 \\"))
*/
只有前 3 个案例按预期工作。
解决方法
一个不好/必须的解决方案:
use nom::{bytes::complete::take,character::complete::char,sequence::delimited,IResult};
fn parse_quoted(input: &str) -> IResult<&str,&str> {
fn escaped(input: &str) -> IResult<&str,&str> {
let mut pc = 0 as char;
let mut n = 0;
for (i,c) in input.chars().enumerate() {
if c == '\'' && pc != '\\' {
break;
}
pc = c;
n = i + 1;
}
take(n)(input)
}
delimited(char('\''),escaped,char('\''))(input)
}
fn main() {
println!("{:?}",parse_quoted(r#"'' ..."#));
println!("{:?}",parse_quoted(r#"'1' ..."#));
println!("{:?}",parse_quoted(r#"'1 \' 2' ..."#));
println!("{:?}",parse_quoted(r#"'1 \' 2 \ 3' ..."#));
println!("{:?}",parse_quoted(r#"'1 \' 2 \ 3 \\ 4' ..."#));
}
/*
Ok((" ...",""))
Ok((" ...","1"))
Ok((" ...","1 \\' 2"))
Ok((" ...","1 \\' 2 \\ 3"))
Ok((" ...","1 \\' 2 \\ 3 \\\\ 4"))
*/
为了允许 '...\\'
,我们可以类似地存储更多以前的字符:
let mut pc = 0 as char;
let mut ppc = 0 as char;
let mut pppc = 0 as char;
let mut n = 0;
for (i,c) in input.chars().enumerate() {
if (c == '\'' && pc != '\\') || (c == '\'' && pc == '\\' && ppc == '\\' && pppc != '\\') {
break;
}
pppc = ppc;
ppc = pc;
pc = c;
n = i + 1;
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。