微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

语法没有将 '123 和 ] 分开,尽管为它设置了规则

如何解决语法没有将 '123 和 ] 分开,尽管为它设置了规则

我是 antlr 的新手。我正在尝试解析一些查询,如 [network-traffic:src_port = '123] 和 [network-traffic:src_port =] 和 [network-traffic:src_port = ] 和......我的语法如下:

grammar StixPattern;

pattern
  : observationExpressions EOF
  ;

observationExpressions
  : <assoc=left> observationExpressions FOLLOWEDBY observationExpressions #observationExpressionsFollowedBY
  | observationExpressionor                                               #observationExpressionor_
  ;

observationExpressionor
  : <assoc=left> observationExpressionor OR observationExpressionor     #observationExpressionored
  | observationExpressionAnd                                            #observationExpressionAnd_
  ;

observationExpressionAnd
  : <assoc=left> observationExpressionAnd AND observationExpressionAnd  #observationExpressionAnded
  | observationExpression                                               #observationExpression_
  ;

observationExpression
  : LBRACK comparisonExpression RBRACK        # observationExpressionSimple
  | LPAREN observationExpressions RPAREN      # observationExpressionCompound
  | observationExpression startStopQualifier  # observationExpressionStartStop
  | observationExpression withinQualifier     # observationExpressionWithin
  | observationExpression repeatedQualifier   # observationExpressionRepeated
  ;

comparisonExpression
  : <assoc=left> comparisonExpression OR comparisonExpression         #comparisonExpressionored
  | comparisonExpressionAnd                                           #comparisonExpressionAnd_
  ;

comparisonExpressionAnd
  : <assoc=left> comparisonExpressionAnd AND comparisonExpressionAnd  #comparisonExpressionAnded
  | propTest                                                          #comparisonExpressionAndpropTest
  ;

propTest
  : objectPath NOT? (EQ|NEQ) primitiveLiteral       # propTestEqual
  | objectPath NOT? (GT|LT|GE|LE) orderableLiteral  # propTestOrder
  | objectPath NOT? IN setLiteral                   # propTestSet
  | objectPath NOT? LIKE StringLiteral              # propTestLike
  | objectPath NOT? MATCHES StringLiteral           # propTestRegex
  | objectPath NOT? ISSUBSET StringLiteral          # propTestIsSubset
  | objectPath NOT? ISSUPERSET StringLiteral        # propTestIsSuperset
  | LPAREN comparisonExpression RPAREN              # propTestParen
  | objectPath NOT? (EQ|NEQ) objectPathThl    # propTestThlEqual
  ;

startStopQualifier
  : START TimestampLiteral STOP TimestampLiteral
  ;

withinQualifier
  : WITHIN (IntPosLiteral|FloatPosLiteral) SECONDS
  ;

repeatedQualifier
  : REPEATS IntPosLiteral TIMES
  ;

objectPath
  : objectType COLON firstPathComponent objectPathComponent?
  ;

objectPathThl
  : varThlType DOT firstPathComponent objectPathComponent?
  ;

objectType
  : IdentifierWithoutHyphen
  | IdentifierWithHyphen
  ;

varThlType
  : IdentifierWithoutHyphen
  | IdentifierWithHyphen
  ;

firstPathComponent
  : IdentifierWithoutHyphen
  | StringLiteral
  ;

objectPathComponent
  : <assoc=left> objectPathComponent objectPathComponent  # pathStep
  | '.' (IdentifierWithoutHyphen | StringLiteral)         # keypathstep
  | LBRACK (IntPosLiteral|IntNegLiteral|ASTERISK) RBRACK  # indexPathStep
  ;

setLiteral
  : LPAREN RPAREN
  | LPAREN primitiveLiteral (COMMA primitiveLiteral)* RPAREN
  ;

primitiveLiteral
  : orderableLiteral  
  | BoolLiteral  
  | edgeCases  
  ;

edgeCases
  : QUOTE (IdentifierWithHyphen | IdentifierWithoutHyphen | IntNoSign) RBRACK 
  | RBRACK
  ;


orderableLiteral
  : IntPosLiteral
  | IntNegLiteral
  | FloatPosLiteral
  | FloatNegLiteral
  | StringLiteral
  | BinaryLiteral
  | HexLiteral
  | TimestampLiteral
  ;



IntNegLiteral :
  '-' ('0' | [1-9] [0-9]*)
  ;

IntNoSign :
  ('0' | [1-9] [0-9]*)
  ;

IntPosLiteral :
  '+'? ('0' | [1-9] [0-9]*)
  ;

FloatNegLiteral :
  '-' [0-9]* '.' [0-9]+
  ;

FloatPosLiteral :
  '+'? [0-9]* '.' [0-9]+
  ;

HexLiteral :
  'h' QUOTE TwoHexDigits* QUOTE
  ;

BinaryLiteral :
  'b' QUOTE
  ( Base64Char Base64Char Base64Char Base64Char )*
  ( (Base64Char Base64Char Base64Char Base64Char )
  | (Base64Char Base64Char Base64Char ) '='
  | (Base64Char Base64Char ) '=='
  )
  QUOTE
  ;

StringLiteral :
  QUOTE ( ~['\\] | '\\\'' | '\\\\' )* QUOTE
  ;

BoolLiteral :
  TRUE | FALSE
  ;

TimestampLiteral :
  't' QUOTE
  [0-9] [0-9] [0-9] [0-9] HYPHEN
  ( ('0' [1-9]) | ('1' [012]) ) HYPHEN
  ( ('0' [1-9]) | ([12] [0-9]) | ('3' [01]) )
  'T'
  ( ([01] [0-9]) | ('2' [0-3]) ) COLON
  [0-5] [0-9] COLON
  ([0-5] [0-9] | '60')
  (DOT [0-9]+)?
  'Z'
  QUOTE
  ;

//////////////////////////////////////////////
// Keywords

AND:  'AND' ;
OR:  'OR' ;
NOT:  'NOT' ;
FOLLOWEDBY: 'FOLLOWEDBY';
LIKE:  'LIKE' ;
MATCHES:  'MATCHES' ;
ISSUPERSET:  'ISSUPERSET' ;
ISSUBSET: 'ISSUBSET' ;
LAST:  'LAST' ;
IN:  'IN' ;
START:  'START' ;
STOP:  'STOP' ;
SECONDS:  'SECONDS' ;
TRUE:  'true' ;
FALSE:  'false' ;
WITHIN:  'WITHIN' ;
REPEATS:  'REPEATS' ;
TIMES:  'TIMES' ;

// After keywords,so the lexer doesn't tokenize them as identifiers.
// Object types may have unquoted hyphens,but property names
// (in object paths) cannot.
IdentifierWithoutHyphen :
  [a-zA-Z_] [a-zA-Z0-9_]*
  ;

IdentifierWithHyphen :
  [a-zA-Z_] [a-zA-Z0-9_-]*
  ;

EQ        :   '=' | '==';
NEQ       :   '!=' | '<>';
LT        :   '<';
LE        :   '<=';
GT        :   '>';
GE        :   '>=';

QUOTE     : '\'';
COLON     : ':' ;
DOT       : '.' ;
COMMA     : ',' ;
RPAREN    : ')' ;
LPAREN    : '(' ;
RBRACK    : ']' ;
LBRACK    : '[' ;
PLUS      : '+' ;
HYPHEN    : MINUS ;
MINUS     : '-' ;
POWER_OP  : '^' ;
DIVIDE    : '/' ;
ASTERISK  : '*';
EQRBRAC : ']';

fragment HexDigit: [A-Fa-f0-9];
fragment TwoHexDigits: HexDigit HexDigit;
fragment Base64Char: [A-Za-z0-9+/];

// Whitespace and comments
//
WS  :  [ \t\r\n\u000B\u000C\u0085\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]+ -> skip
    ;

COMMENT
    :   '/*' .*? '*/' -> skip
    ;

LINE_COMMENT
    :   '//' ~[\r\n]* -> skip
    ;

// Catch-all to prevent lexer from silently eating unusable characters.
InvalidCharacter
    : .
    ;

现在,当我输入 [network-traffic:src_port = '123] 时,我希望 antlr 将查询解析为 '123 和 ]

但是语法返回 '123] 并且无法将 '123 和 ] 分开 有什么遗漏吗?

解决方法

语法没有将 '123 和 ] 分开,尽管规则是为它设置的

这不是真的。引号和 123 单独的标记。正如您的 previous ANTLR 问题中所展示/建议的:首先将所有令牌打印到您的控制台以查看正在创建哪些令牌。在尝试调试 ANTLR 语法时,这应该始终是您做的第一件事。它将为您节省大量时间和头痛。

事实 [network-traffic:src_port = '123] 没有正确解析,是因为 ](RBRACK) 被替代 observationExpressionSimple 消耗:

observationExpression
  : LBRACK comparisonExpression RBRACK        # observationExpressionSimple
  | LPAREN observationExpressions RPAREN      # observationExpressionCompound
  | observationExpression startStopQualifier  # observationExpressionStartStop
  | observationExpression withinQualifier     # observationExpressionWithin
  | observationExpression repeatedQualifier   # observationExpressionRepeated
  ;

因为 RBRACK 已被解析器规则使用,所以 edgeCases 规则也不能使用此 RBRACK 令牌。

要解决此问题,请更改您的规则:

edgeCases
  : QUOTE (IdentifierWithHyphen | IdentifierWithoutHyphen | IntNoSign) RBRACK
  | RBRACK
  ;

进入这个:

edgeCases
  : QUOTE (IdentifierWithHyphen | IdentifierWithoutHyphen | IntNoSign)
  ;

现在 [network-traffic:src_port = '123] 将被正确解析:

enter image description here

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。