以下语法在给出此输入时正常工作:
cd/someotherpath/someotherpath/path
此输入应解析为标识符(cd)和对象路径(someotherpath / someotherpath / path)由'/'分隔
我花了几个小时才找到一个有效的语法规则。语法中有一个注释掉的 identify_path 规则,该规则不起作用。该规则中的问题是,即使存在无法解析的字符,也会接受开头的独立标识符。这让我感到困惑。当使用注释掉的规则时,Antlr开始解析,看到cd,将其识别为标识符,然后看到'/',无法识别它,然后离开规则而不尝试其他替代方案!即使我在标识符上仅强制使用语法谓词,例如(identifier)=>标识符,Antlr接受这个不完整的匹配,并没有查看其他替代方案。
如语法中所示打破规则后,它按预期工作,但我不知道为什么第一个(下面未注释)不起作用。它只是工作版的内联版本。这是语法:
grammar RecursionTests;
@members{
public boolean isValidAlphanumericAtIdentifier(String pVal){
if(pVal.toUpperCase().startsWith("A")){
//the second character must be a letter //
char[] tempCharArr = pVal.substring(1,2).toCharArray();
char secondChar = tempCharArr[0];
if( secondChar < 'A' || secondChar > 'z')//is it a letter?
return false;
//the second char is a letter, and this is alphanumeric,
//so there must be a third at least, but that third must be valid also
if(pVal.toUpperCase().startsWith("AT") && pVal.substring(2,3).toUpperCase().equals("T"))
return false; //att is not allowed
//passed all tests, it is valid
return true;
}
return false;
}
public boolean isValidNonAtIdentifier(String pVal){
if(pVal.length() > 1){
return !(pVal.substring(1,2).toUpperCase().equals("T"));//second char should not be T
}
else
return !(pVal.toUpperCase().equals("A"));//ok if it is not A
}
}
rul : identified_path
;
//Identifier = {LetterMinusA}{IdCharMinusT}?{IdChar}* | 'a''t'?(({letter}|'_')*|{LetterMinusT}{Alphanumeric}*)
/*
identified_path
: identifier
| (identifier forward_slash object_path)=> identifier forward_slash object_path
| identifier predicate
| (identifier predicate forward_slash object_path)=>identifier predicate forward_slash object_path
;
*/
identified_path
: identifier_or_id_based_path
| identifier_or_id_predicate_path
;
identifier_or_id_based_path
: identifier
| (identifier forward_slash object_path)=>(identifier forward_slash object_path)
;
identifier_or_id_predicate_path
: identifier predicate
| (identifier predicate forward_slash object_path)=>identifier predicate forward_slash object_path
;
object_path : path_part (forward_slash path_part)*
;
forward_slash
: {input.LT(1).getText().equals("/")}? Uri_String_Chars
;
path_part : identifier (predicate)?
;
predicate : node_predicate
;
node_predicate : square_bracket_open node_predicate_expr square_bracket_close
//node_predicate : square_bracket_open identifier square_bracket_close
;
square_bracket_open
: {input.LT(1).getText().equals("[")}? Non_Uri_String_RegEX_Chars
;
square_bracket_close
: {input.LT(1).getText().equals("]")}? Non_Uri_String_RegEX_Chars
;
node_predicate_expr
: (node_predicate_comparable ((And | Or) node_predicate_comparable)*)=>node_predicate_comparable ((And | Or) node_predicate_comparable)*
;
node_predicate_comparable : (predicate_operand comparable_operator predicate_operand)=> predicate_operand comparable_operator predicate_operand
| Node_id
| (Node_id char_comma string_r)=> Node_id char_comma string_r // node_id_r and name/value = <String> shortcut
| (Node_id char_comma parameter)=> Node_id char_comma parameter // node_id_r and name/value = <Parameter> shortcut
| (node_predicate_reg_ex)=> node_predicate_reg_ex // /items[{/at0001.* /}], /items[at0001 and name/value matches {//}
| (archetype_id)=>archetype_id
| (archetype_id char_comma string_r)=> archetype_id char_comma string_r // node_id_r and name/value = <String> shortcut
| (archetype_id char_comma parameter)=> archetype_id char_comma parameter // node_id_r and name/value = <Parameter> shortcut
;
predicate_operand : //identifier
//| identifier PathItem
object_path
| operand
;
operand : string_r | Integer_r | | date_r | parameter | Boolean_r
;
string_r
: (Quotation_Mark string_char* Quotation_Mark)
| Quote Quote string_char* Quote Quote
;
parameter
: char_dollar_sign Letter id_char*
;
archetype_id
: Letter char_hypen Letter char_hypen archetype_id_letter_underscore_literal Dot (id_char|char_hypen) Dot alphanumeric
;
archetype_id_letter_underscore_literal
: Letter
| Letter_or_underscore
;
comparable_operator
: char_equals | op_not_equals | char_greater | op_greater_or_eq | char_smaller | op_smaller_or_eq //Uri_String_Chars
;
char_equals
: {input.LT(1).getText().equals("=")}? Uri_String_Chars
;
op_not_equals
: {input.LT(1).getText().equals("!") && input.LT(2).getText().equals("=") }? (Uri_String_Chars Uri_String_Chars)
;
char_greater
: {input.LT(1).getText().equals(">")}? Special_Chars
;
op_greater_or_eq
: {input.LT(1).getText().equals(">") && input.LT(2).getText().equals("=") }? (Special_Chars Uri_String_Chars)
;
char_smaller
: {input.LT(1).getText().equals("<")}? Special_Chars
;
op_smaller_or_eq
: {input.LT(1).getText().equals("<") && input.LT(2).getText().equals("=") }? (Special_Chars Uri_String_Chars)
;
date_r
: Quote Quote Single_Digit Single_Digit Single_Digit Single_Digit char_hypen Single_Digit Single_Digit char_hypen Single_Digit Single_Digit Single_Digit Single_Digit
;
node_predicate_reg_ex : reg_ex_pattern
| predicate_operand Op_matches reg_ex_pattern
;
reg_ex_pattern
: start_reg_ex_pattern reg_ex_char+ end_reg_ex_pattern
;
start_reg_ex_pattern
: { input.LT(1).getText().equals("{") &&
input.LT(2).getText().equals("/")
}? (Non_Uri_String_RegEX_Chars Non_Uri_String_RegEX_Chars)
;
end_reg_ex_pattern
: { input.LT(1).getText().equals("/") &&
input.LT(2).getText().equals("}")
}? (Non_Uri_String_RegEX_Chars Non_Uri_String_RegEX_Chars)
;
reg_ex_char
: alphanumeric | Uri_String_Chars | Non_Uri_String_RegEX_Chars
;
letter_minus_a
: {input.LT(1).getText().contains("a") == false && input.LT(1).getText().contains("A") == false}? Single_letter
;
letter_minus_t
: {input.LT(1).getText().contains("t") == false && input.LT(1).getText().contains("T") == false}? Single_letter
;
id_char_minus_t
: {input.LT(1).getText().contains("t") == false && input.LT(1).getText().contains("T") == false}? Single_Id_Char
;
id_char
: Id_char
| Letter_or_underscore //may hit this since it is more specific than Id_char
;
alphanumeric //alternatives to alphanumeric will show up since they are more specific than alphanumeric, but may fit
: Alphanumeric
| Single_letter
| Letter
;
string_char
: String_char
;
char_low_case_a
: {input.LT(1).getText().equals("a")}? Single_letter
;
char_low_case_t
: {input.LT(1).getText().equals("t")}? Single_letter
;
char_comma
: {input.LT(1).getText().equals(",")}? Special_Chars
;
char_dollar_sign
: {input.LT(1).getText().equals("$")}? Uri_String_Chars
;
char_hypen
: {input.LT(1).getText().equals("-")}? Uri_String_Chars
;
letter_or_underscore
: Letter_or_underscore
;
//Identifier = {LetterMinusA}{IdCharMinusT}?{IdChar}* | 'a''t'?(({letter}|'_')*|{LetterMinusT}{Alphanumeric}*)
identifier
: {!(input.LT(1).getText().toUpperCase().startsWith("A")) }?=>non_at_identifier
| {input.LT(1).getText().toUpperCase().startsWith("A")}?=>at_identifier
;
non_at_identifier
: {isValidNonAtIdentifier(input.LT(1).getText())}?non_at_identifier_literal
;
at_identifier
: at_identifier_literal
;
at_identifier_literal
: Single_letter //if it is only one letter, it must be a|A
| Letter //if more than one letter, again it must start with a|A
| Letter_or_underscore
| {isValidAlphanumericAtIdentifier(input.LT(1).getText())}?Alphanumeric //if second char it t, third must be a non T LETTER
;
non_at_identifier_literal
: Id_char
| Alphanumeric
| Letter
| Letter_or_underscore
| Single_letter
;
Node_id
: At_code ( Digit+ (Dot Digit+)*)
;
At_code : 'at'
;
And : 'and'
;
Or : 'or'
;
Dot : '.'
;
Op_matches
: 'matches'
;
Boolean_r
: 'true'| 'false'
;
Quote : '\''
;
Single_Digit
: Digit
;
Integer_r
: Digit+
;
Float_r
: Digit+ '.' Digit+
;
Single_letter
: Letter_lowercase | Letter_uppercase
;
Letter : (Letter_lowercase | Letter_uppercase)+
;
Alphanumeric
: (Letter_lowercase | Letter_uppercase | Digit)+
;
Special_Chars
: (Special_Char_list)+
;
String_char
: (Special_Char_list | Letter_lowercase | Letter_uppercase | Digit)+
;
Single_Id_Char
: Letter_lowercase | Letter_uppercase | Underscore | Digit
;
Letter_or_underscore
: (Letter | Underscore)+
;
Id_char
: (Letter| Digit | Underscore)+
;
//Identifier = {LetterMinusA}{IdCharMinusT}?{IdChar}* | 'a''t'?(({letter}|'_')*|{LetterMinusT}{Alphanumeric}*)
Uri_String_Chars
: '_' | '-' | '/' | ':' | '.' | '?' | '&' | '%' | '$' | '#' | '@' | '!' | '+' | '=' | '*'
;
Non_Uri_String_RegEX_Chars//used for regex, alongside Uri_String_Chars
: '|' |'(' | ')' |'\\' | '^' | '{' | '}' | '[' | | ']'
;
Quotation_Mark
: '"'
;
fragment Special_Char_list
: //' '|
','
| ';' | '<' | '>'
| '`'
| '~'
;
/*
AND : 'and'
;
OR : 'or'
;
AT : 'at'
;
MATCHES : 'matches'
;
*/
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment Letter_uppercase
: 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z'
;
fragment Letter_lowercase
: 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z'
;
fragment Underscore
: '_'
;
fragment Digit
: '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
;
答案 0 :(得分:1)
我没有测试它,但是尝试将谓词放在规则的开头,然后再试一次:
identified_path
: (identifier predicate forward_slash object_path)=>
identifier predicate forward_slash object_path
| (identifier forward_slash object_path)=>
identifier forward_slash object_path
| identifier predicate
| identifier
;
或:
identified_path
: (identifier predicate forward_slash object_path)=>
identifier predicate forward_slash object_path
| (identifier forward_slash object_path)=>
identifier forward_slash object_path
| identifier predicate?
;
解析器从上到下经历了替代方案:这就是为什么你强制要求的规则(在它前面有谓词的规则)通常最好放在顶部。