如何解决ANTLR解决非LL*问题和语法谓词
|| 考虑解析器中的以下规则:expression
: IDENTIFIER
| (...)
| procedure_call // e.g. (foo 1 2 3)
| macro_use // e.g. (xyz (some datum))
;
procedure_call
: \'(\' expression expression* \')\'
;
macro_use
: \'(\' IDENTIFIER datum* \')\'
;
和
// Note that any string that parses as an <expression> will also parse as a <datum>.
datum
: simple_datum
| compound_datum
;
simple_datum
: BOOLEAN
| NUMBER
| CHaraCTER
| STRING
| IDENTIFIER
;
compound_datum
: list
| vector
;
list
: \'(\' (datum+ ( \'.\' datum)?)? \')\'
| ABBREV_PREFIX datum
;
fragment ABBREV_PREFIX
: (\'\\\'\' | \'`\' | \',\' | \',@\')
;
vector
: \'#(\' datum* \')\'
;
表达式规则中的procedure_call和macro_rule替代方案会生成非LL(*)结构错误。我可以看到问题了,因为(IDENTIFIER)
会同时解析为两者。但是即使我用+而不是*都定义了,即使上面的示例不应该再解析,它也会产生错误。
我想出了句法谓词的用法,但是我无法弄清楚如何使用它们来完成技巧。
就像是
expression
: IDENTIFIER
| (...)
| (procedure_call)=>procedure_call // e.g. (foo 1 2 3)
| macro_use // e.g. (xyz (some datum))
;
要么
expression
: IDENTIFIER
| (...)
| (\'(\' IDENTIFIER expression)=>procedure_call // e.g. (foo 1 2 3)
| macro_use // e.g. (xyz (some datum))
;
也不起作用,因为除了第一个规则外,什么都不会匹配。有解决这个问题的适当方法吗?
解决方法
我发现了R5RS的JavaCC语法,我曾经(很快!)写过ANTLR等效文件:
/*
* Copyright (C) 2011 by Bart Kiers,based on the work done by Håkan L. Younes\'
* JavaCC R5RS grammar,available at: http://mindprod.com/javacc/R5RS.jj
*
* Permission is hereby granted,free of charge,to any person obtaining a copy
* of this software and associated documentation files (the \"Software\"),to deal
* in the Software without restriction,including without limitation the rights
* to use,copy,modify,merge,publish,distribute,sublicense,and/or sell
* copies of the Software,and to permit persons to whom the Software is
* furnished to do so,subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED \"AS IS\",WITHOUT WARRANTY OF ANY KIND,EXPRESS OR
* IMPLIED,INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,DAMAGES OR OTHER
* LIABILITY,WHETHER IN AN ACTION OF CONTRACT,TORT OR OTHERWISE,ARISING FROM,* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
grammar R5RS;
parse
: commandOrDefinition* EOF
;
commandOrDefinition
: (syntaxDefinition)=> syntaxDefinition
| (definition)=> definition
| (\'(\' BEGIN commandOrDefinition)=> \'(\' BEGIN commandOrDefinition+ \')\'
| command
;
syntaxDefinition
: \'(\' DEFINE_SYNTAX keyword transformerSpec \')\'
;
definition
: \'(\' DEFINE ( variable expression \')\'
| \'(\' variable defFormals \')\' body \')\'
)
| \'(\' BEGIN definition* \')\'
;
defFormals
: variable* (\'.\' variable)?
;
keyword
: identifier
;
transformerSpec
: \'(\' SYNTAX_RULES \'(\' identifier* \')\' syntaxRule* \')\'
;
syntaxRule
: \'(\' pattern template \')\'
;
pattern
: patternIdentifier
| \'(\' (pattern+ (\'.\' pattern | ELLIPSIS)?)? \')\'
| \'#(\' (pattern+ ELLIPSIS? )? \')\'
| patternDatum
;
patternIdentifier
: syntacticKeyword
| VARIABLE
;
patternDatum
: STRING
| CHARACTER
| bool
| number
;
template
: patternIdentifier
| \'(\' (templateElement+ (\'.\' templateElement)?)? \')\'
| \'#(\' templateElement* \')\'
| templateDatum
;
templateElement
: template ELLIPSIS?
;
templateDatum
: patternDatum
;
command
: expression
;
identifier
: syntacticKeyword
| variable
;
syntacticKeyword
: expressionKeyword
| ELSE
| ARROW
| DEFINE
| UNQUOTE
| UNQUOTE_SPLICING
;
expressionKeyword
: QUOTE
| LAMBDA
| IF
| SET
| BEGIN
| COND
| AND
| OR
| CASE
| LET
| LETSTAR
| LETREC
| DO
| DELAY
| QUASIQUOTE
;
expression
: (variable)=> variable
| (literal)=> literal
| (lambdaExpression)=> lambdaExpression
| (conditional)=> conditional
| (assignment)=> assignment
| (derivedExpression)=> derivedExpression
| (procedureCall)=> procedureCall
| (macroUse)=> macroUse
| macroBlock
;
variable
: VARIABLE
| ELLIPSIS
;
literal
: quotation
| selfEvaluating
;
quotation
: \'\\\'\' datum
| \'(\' QUOTE datum \')\'
;
selfEvaluating
: bool
| number
| CHARACTER
| STRING
;
lambdaExpression
: \'(\' LAMBDA formals body \')\'
;
formals
: \'(\' (variable+ (\'.\' variable)?)? \')\'
| variable
;
conditional
: \'(\' IF test consequent alternate? \')\'
;
test
: expression
;
consequent
: expression
;
alternate
: expression
;
assignment
: \'(\' SET variable expression \')\'
;
derivedExpression
: quasiquotation
| \'(\' ( COND ( \'(\' ELSE sequence \')\'
| condClause+ (\'(\' ELSE sequence \')\')?
)
| CASE expression ( \'(\' ELSE sequence \')\'
| caseClause+ (\'(\' ELSE sequence \')\')?
)
| AND test*
| OR test*
| LET variable? \'(\' bindingSpec* \')\' body
| LETSTAR \'(\' bindingSpec* \')\' body
| LETREC \'(\' bindingSpec* \')\' body
| BEGIN sequence
| DO \'(\' iterationSpec* \')\' \'(\' test doResult? \')\' command*
| DELAY expression
)
\')\'
;
condClause
: \'(\' test (sequence | ARROW recipient)? \')\'
;
recipient
: expression
;
caseClause
: \'(\' \'(\' datum* \')\' sequence \')\'
;
bindingSpec
: \'(\' variable expression \')\'
;
iterationSpec
: \'(\' variable init step? \')\'
;
init
: expression
;
step
: expression
;
doResult
: sequence
;
procedureCall
: \'(\' operator operand* \')\'
;
operator
: expression
;
operand
: expression
;
macroUse
: \'(\' keyword datum* \')\'
;
macroBlock
: \'(\' (LET_SYNTAX | LETREC_SYNTAX) \'(\' syntaxSpec* \')\' body \')\'
;
syntaxSpec
: \'(\' keyword transformerSpec \')\'
;
body
: ((definition)=> definition)* sequence
;
//sequence
// : ((command)=> command)* expression
// ;
sequence
: expression+
;
datum
: simpleDatum
| compoundDatum
;
simpleDatum
: bool
| number
| CHARACTER
| STRING
| identifier
;
compoundDatum
: list
| vector
;
list
: \'(\' (datum+ (\'.\' datum)?)? \')\'
| abbreviation
;
abbreviation
: abbrevPrefix datum
;
abbrevPrefix
: \'\\\'\' | \'`\' | \',@\' | \',\'
;
vector
: \'#(\' datum* \')\'
;
number
: NUM_2
| NUM_8
| NUM_10
| NUM_16
;
bool
: TRUE
| FALSE
;
quasiquotation
: quasiquotationD[1]
;
quasiquotationD[int d]
: \'`\' qqTemplate[d]
| \'(\' QUASIQUOTE qqTemplate[d] \')\'
;
qqTemplate[int d]
: (expression)=> expression
| (\'(\' UNQUOTE)=> unquotation[d]
| simpleDatum
| vectorQQTemplate[d]
| listQQTemplate[d]
;
vectorQQTemplate[int d]
: \'#(\' qqTemplateOrSplice[d]* \')\'
;
listQQTemplate[int d]
: \'\\\'\' qqTemplate[d]
| (\'(\' QUASIQUOTE)=> quasiquotationD[d+1]
| \'(\' (qqTemplateOrSplice[d]+ (\'.\' qqTemplate[d])?)? \')\'
;
unquotation[int d]
: \',\' qqTemplate[d-1]
| \'(\' UNQUOTE qqTemplate[d-1] \')\'
;
qqTemplateOrSplice[int d]
: (\'(\' UNQUOTE_SPLICING)=> splicingUnquotation[d]
| qqTemplate[d]
;
splicingUnquotation[int d]
: \',@\' qqTemplate[d-1]
| \'(\' UNQUOTE_SPLICING qqTemplate[d-1] \')\'
;
// macro keywords
LET_SYNTAX : \'let-syntax\';
LETREC_SYNTAX : \'letrec-syntax\';
SYNTAX_RULES : \'syntax-rules\';
DEFINE_SYNTAX : \'define-syntax\';
// syntactic keywords
ELSE : \'else\';
ARROW : \'=>\';
DEFINE : \'define\';
UNQUOTE_SPLICING : \'unquote-splicing\';
UNQUOTE : \'unquote\';
// expression keywords
QUOTE : \'quote\';
LAMBDA : \'lambda\';
IF : \'if\';
SET : \'set!\';
BEGIN : \'begin\';
COND : \'cond\';
AND : \'and\';
OR : \'or\';
CASE : \'case\';
LET : \'let\';
LETSTAR : \'let*\';
LETREC : \'letrec\';
DO : \'do\';
DELAY : \'delay\';
QUASIQUOTE : \'quasiquote\';
NUM_2 : PREFIX_2 COMPLEX_2;
NUM_8 : PREFIX_8 COMPLEX_8;
NUM_10 : PREFIX_10? COMPLEX_10;
NUM_16 : PREFIX_16 COMPLEX_16;
ELLIPSIS : \'...\';
VARIABLE
: INITIAL SUBSEQUENT*
| PECULIAR_IDENTIFIER
;
STRING : \'\"\' STRING_ELEMENT* \'\"\';
CHARACTER : \'#\\\\\' (~(\' \' | \'\\n\') | CHARACTER_NAME);
TRUE : \'#\' (\'t\' | \'T\');
FALSE : \'#\' (\'f\' | \'F\');
// to ignore
SPACE : (\' \' | \'\\t\' | \'\\r\' | \'\\n\') {$channel=HIDDEN;};
COMMENT : \';\' ~(\'\\r\' | \'\\n\')* {$channel=HIDDEN;};
// fragments
fragment INITIAL : LETTER | SPECIAL_INITIAL;
fragment LETTER : \'a\'..\'z\' | \'A\'..\'Z\';
fragment SPECIAL_INITIAL : \'!\' | \'$\' | \'%\' | \'&\' | \'*\' | \'/\' | \':\' | \'<\' | \'=\' | \'>\' | \'?\' | \'^\' | \'_\' | \'~\';
fragment SUBSEQUENT : INITIAL | DIGIT | SPECIAL_SUBSEQUENT;
fragment DIGIT : \'0\'..\'9\';
fragment SPECIAL_SUBSEQUENT : \'.\' | \'+\' | \'-\' | \'@\';
fragment PECULIAR_IDENTIFIER : \'+\' | \'-\';
fragment STRING_ELEMENT : ~(\'\"\' | \'\\\\\') | \'\\\\\' (\'\"\' | \'\\\\\');
fragment CHARACTER_NAME : \'space\' | \'newline\';
fragment COMPLEX_2
: REAL_2 (\'@\' REAL_2)?
| REAL_2? SIGN UREAL_2? (\'i\' | \'I\')
;
fragment COMPLEX_8
: REAL_8 (\'@\' REAL_8)?
| REAL_8? SIGN UREAL_8? (\'i\' | \'I\')
;
fragment COMPLEX_10
: REAL_10 (\'@\' REAL_10)?
| REAL_10? SIGN UREAL_10? (\'i\' | \'I\')
;
fragment COMPLEX_16
: REAL_16 (\'@\' REAL_16)?
| REAL_16? SIGN UREAL_16? (\'i\' | \'I\')
;
fragment REAL_2 : SIGN? UREAL_2;
fragment REAL_8 : SIGN? UREAL_8;
fragment REAL_10 : SIGN? UREAL_10;
fragment REAL_16 : SIGN? UREAL_16;
fragment UREAL_2 : UINTEGER_2 (\'/\' UINTEGER_2)?;
fragment UREAL_8 : UINTEGER_8 (\'/\' UINTEGER_8)?;
fragment UREAL_10 : UINTEGER_10 (\'/\' UINTEGER_10)? | DECIMAL_10;
fragment UREAL_16 : UINTEGER_16 (\'/\' UINTEGER_16)?;
fragment DECIMAL_10
: UINTEGER_10 SUFFIX
| \'.\' DIGIT+ \'#\'* SUFFIX?
| DIGIT+ \'.\' DIGIT* \'#\'* SUFFIX?
| DIGIT+ \'#\'+ \'.\' \'#\'* SUFFIX?
;
fragment UINTEGER_2 : DIGIT_2+ \'#\'*;
fragment UINTEGER_8 : DIGIT_8+ \'#\'*;
fragment UINTEGER_10 : DIGIT+ \'#\'*;
fragment UINTEGER_16 : DIGIT_16+ \'#\'*;
fragment PREFIX_2 : RADIX_2 EXACTNESS? | EXACTNESS RADIX_2;
fragment PREFIX_8 : RADIX_8 EXACTNESS? | EXACTNESS RADIX_8;
fragment PREFIX_10 : RADIX_10 EXACTNESS? | EXACTNESS RADIX_10;
fragment PREFIX_16 : RADIX_16 EXACTNESS? | EXACTNESS RADIX_16;
fragment SUFFIX : EXPONENT_MARKER SIGN? DIGIT+;
fragment EXPONENT_MARKER : \'e\' | \'s\' | \'f\' | \'d\' | \'l\' | \'E\' | \'S\' | \'F\' | \'D\' | \'L\';
fragment SIGN : \'+\' | \'-\';
fragment EXACTNESS : \'#\' (\'i\' | \'e\' | \'I\' | \'E\');
fragment RADIX_2 : \'#\' (\'b\' | \'B\');
fragment RADIX_8 : \'#\' (\'o\' | \'O\');
fragment RADIX_10 : \'#\' (\'d\' | \'D\');
fragment RADIX_16 : \'#\' (\'x\' | \'X\');
fragment DIGIT_2 : \'0\' | \'1\';
fragment DIGIT_8 : \'0\'..\'7\';
fragment DIGIT_16 : DIGIT | \'a\'..\'f\' | \'A\'..\'F\';
可以使用以下类进行测试:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String source =
\"(define sum-iter \\n\" +
\" (lambda(n acc i) \\n\" +
\" (if (> i n) \\n\" +
\" acc \\n\" +
\" (sum-iter n (+ acc i) (+ i 1))))) \";
R5RSLexer lexer = new R5RSLexer(new ANTLRStringStream(source));
R5RSParser parser = new R5RSParser(new CommonTokenStream(lexer));
parser.parse();
}
}
并生成一个词法分析器和解析器,编译所有Java源文件并运行主类,请执行以下操作:
bart @ hades:〜/编程/ ANTLR / Demos / R5RS $ java -cp antlr-3.3.jar org.antlr.Tool R5RS.g
bart @ hades:〜/编程/ ANTLR / Demos / R5RS $ javac -cp antlr-3.3.jar * .java
bart @ hades:〜/编程/ ANTLR / Demos / R5RS $ java -cp。:antlr-3.3.jar主要
bart @ hades:〜/编程/ ANTLR / Demos / R5RS $
控制台上没有任何输出的事实意味着解析器(和词法分析器)在提供的源中未发现任何错误。
请注意,我没有单元测试,只测试了Main
类中的单个Scheme资源。如果您在ANTLR语法中发现错误,不胜感激,希望我能解决该语法。在适当的时候,我可能会将语法提交给官方的ANTLR Wiki。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。