我必须编写要在Java应用程序中使用的解析器,该解析器将接受:
每个令牌都由以下任意一个分隔:
<WHITE : ([" ", "\t"])+ >
<COMMA : (",") >
<SEMICOLON : (";") >
<EOL : ("\r" | "\n" | "\r\n") >
如果范围没有可选空格,例如:
1- 2
2 -3
3 - 4
4-5
测试字符串是这样的:
" 1 2 3 4 5,6,7;8;9,, 10;11;;, ;,;,,;\n\n ;,,; 12,13-13, 14 - 14 15- 15 16 -16 \n17-17\n 18 - 18\n 19 - 19 \n GROUP_1_A;GROUP_1_A GROUP_1_A;GROUP_1_A,GROUP_1_A ,;;\n\n \"GROUP_1_A\" ;; 20"
我尝试了几种方法来定义“-”周围的空格,但总之以无限嵌套循环结束,该循环处理给定简单字符串直到结尾,然后从头开始,或者无法进行下一次迭代。如果有一种方法可以检查访问下一个令牌而不消耗它,那将很容易。
SKIP: {
< QUOTATION : ( ["\""] ) > |
< APOSTROPHE : ( ["'"] ) >
}
TOKEN: {
< NAME : ( ["a"-"z", "A"-"Z"])+ (["a"-"z", "A"-"Z", "_", "0"-"9"] )* > |
< NUM : ( ["0"-"9"] ){1,5} > |
< WHITE : ( [" ", "\t"] ) > |
< EOL : ( "\n" | "\r" | "\r\n" ) > |
< COMMA : ( [","] ) > |
< SEMICOLON : ( [";"] ) >
}
Map<String, List<String>> parse() : {
Map<String, List<String>> result = new HashMap<String, List<String>>();
List<String> single = new ArrayList<String>();
List<String> range = new ArrayList<String>();
List<String> named = new ArrayList<String>();
result.put(SINGLE, single);
result.put(RANGE, range);
result.put(NAMED, named);
Token name = null;
Token first = null;
Token last = null;
}
{
(<WHITE>)*
(
(name = <NAME> |
first = <NUM>
(LOOKAHEAD(2) (<WHITE>)* "-" (<WHITE>)* last = <NUM>)?
)
((LOOKAHEAD(2) <EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+ | <EOF>)
{
if (name != null) {
named.add(name.image);
} else if (first != null && last == null) {
single.add(first.image);
} else if (first != null && last != null) {
String s = first.image + " - " + last.image;
range.add(s);
} else {
System.err.println("Parser error found");
}
name = null;
first = null;
last = null;
}
)+
{
return result;
}
}
这是解析的输出:
Call: parse
Consumed token: <<WHITE>: " " at line 1 column 1>
Consumed token: <<WHITE>: " " at line 1 column 2>
Consumed token: <<NUM>: "1" at line 1 column 3>
Visited token: <<WHITE>: " " at line 1 column 4>; Expected token: <<WHITE>>
Visited token: <<NUM>: "2" at line 1 column 5>; Expected token: <<WHITE>>
Visited token: <<NUM>: "2" at line 1 column 5>; Expected token: <"-">
Visited token: <<WHITE>: " " at line 1 column 4>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 1 column 4>
Consumed token: <<NUM>: "2" at line 1 column 5>
Visited token: <<WHITE>: " " at line 1 column 6>; Expected token: <<WHITE>>
Visited token: <<NUM>: "3" at line 1 column 7>; Expected token: <<WHITE>>
Visited token: <<NUM>: "3" at line 1 column 7>; Expected token: <"-">
Visited token: <<WHITE>: " " at line 1 column 6>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 1 column 6>
Consumed token: <<NUM>: "3" at line 1 column 7>
Visited token: <<WHITE>: " " at line 1 column 8>; Expected token: <<WHITE>>
Visited token: <<NUM>: "4" at line 1 column 9>; Expected token: <<WHITE>>
Visited token: <<NUM>: "4" at line 1 column 9>; Expected token: <"-">
Visited token: <<WHITE>: " " at line 1 column 8>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 1 column 8>
Consumed token: <<NUM>: "4" at line 1 column 9>
Visited token: <<WHITE>: " " at line 1 column 10>; Expected token: <<WHITE>>
Visited token: <<NUM>: "5" at line 1 column 11>; Expected token: <<WHITE>>
Visited token: <<NUM>: "5" at line 1 column 11>; Expected token: <"-">
Visited token: <<WHITE>: " " at line 1 column 10>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 1 column 10>
Consumed token: <<NUM>: "5" at line 1 column 11>
Visited token: <<COMMA>: "," at line 1 column 12>; Expected token: <<WHITE>>
Visited token: <<COMMA>: "," at line 1 column 12>; Expected token: <"-">
Visited token: <<COMMA>: "," at line 1 column 12>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 1 column 12>
Consumed token: <<NUM>: "6" at line 1 column 13>
Visited token: <<COMMA>: "," at line 1 column 14>; Expected token: <<WHITE>>
Visited token: <<COMMA>: "," at line 1 column 14>; Expected token: <"-">
Visited token: <<COMMA>: "," at line 1 column 14>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 1 column 14>
Consumed token: <<NUM>: "7" at line 1 column 15>
Visited token: <<SEMICOLON>: ";" at line 1 column 16>; Expected token: <<WHITE>>
Visited token: <<SEMICOLON>: ";" at line 1 column 16>; Expected token: <"-">
Visited token: <<SEMICOLON>: ";" at line 1 column 16>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 1 column 16>
Consumed token: <<NUM>: "8" at line 1 column 17>
Visited token: <<SEMICOLON>: ";" at line 1 column 18>; Expected token: <<WHITE>>
Visited token: <<SEMICOLON>: ";" at line 1 column 18>; Expected token: <"-">
Visited token: <<SEMICOLON>: ";" at line 1 column 18>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 1 column 18>
Consumed token: <<NUM>: "9" at line 1 column 19>
Visited token: <<COMMA>: "," at line 1 column 20>; Expected token: <<WHITE>>
Visited token: <<COMMA>: "," at line 1 column 20>; Expected token: <"-">
Visited token: <<COMMA>: "," at line 1 column 20>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 1 column 20>
Visited token: <<COMMA>: "," at line 1 column 21>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 1 column 21>
Visited token: <<WHITE>: " " at line 1 column 22>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 1 column 22>
Visited token: <<WHITE>: " " at line 1 column 23>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 1 column 23>
Consumed token: <<NUM>: "10" at line 1 column 24>
Visited token: <<SEMICOLON>: ";" at line 1 column 26>; Expected token: <<WHITE>>
Visited token: <<SEMICOLON>: ";" at line 1 column 26>; Expected token: <"-">
Visited token: <<SEMICOLON>: ";" at line 1 column 26>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 1 column 26>
Consumed token: <<NUM>: "11" at line 1 column 27>
Visited token: <<SEMICOLON>: ";" at line 1 column 29>; Expected token: <<WHITE>>
Visited token: <<SEMICOLON>: ";" at line 1 column 29>; Expected token: <"-">
Visited token: <<SEMICOLON>: ";" at line 1 column 29>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 1 column 29>
Visited token: <<SEMICOLON>: ";" at line 1 column 30>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 1 column 30>
Visited token: <<COMMA>: "," at line 1 column 31>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 1 column 31>
Visited token: <<WHITE>: " " at line 1 column 32>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 1 column 32>
Visited token: <<WHITE>: " " at line 1 column 33>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 1 column 33>
Visited token: <<SEMICOLON>: ";" at line 1 column 34>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 1 column 34>
Visited token: <<COMMA>: "," at line 1 column 35>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 1 column 35>
Visited token: <<SEMICOLON>: ";" at line 1 column 36>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 1 column 36>
Visited token: <<COMMA>: "," at line 1 column 37>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 1 column 37>
Visited token: <<COMMA>: "," at line 1 column 38>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 1 column 38>
Visited token: <<SEMICOLON>: ";" at line 1 column 39>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 1 column 39>
Visited token: <<EOL>: "\n" at line 1 column 40>; Expected token: <<EOL>>
Consumed token: <<EOL>: "\n" at line 1 column 40>
Visited token: <<EOL>: "\n" at line 2 column 1>; Expected token: <<EOL>>
Consumed token: <<EOL>: "\n" at line 2 column 1>
Visited token: <<WHITE>: " " at line 3 column 1>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 3 column 1>
Visited token: <<WHITE>: " " at line 3 column 2>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 3 column 2>
Visited token: <<WHITE>: " " at line 3 column 3>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 3 column 3>
Visited token: <<SEMICOLON>: ";" at line 3 column 4>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 3 column 4>
Visited token: <<COMMA>: "," at line 3 column 5>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 3 column 5>
Visited token: <<COMMA>: "," at line 3 column 6>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 3 column 6>
Visited token: <<SEMICOLON>: ";" at line 3 column 7>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 3 column 7>
Visited token: <<WHITE>: " " at line 3 column 8>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 3 column 8>
Visited token: <<WHITE>: " " at line 3 column 9>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 3 column 9>
Consumed token: <<NUM>: "12" at line 3 column 10>
Visited token: <<COMMA>: "," at line 3 column 12>; Expected token: <<WHITE>>
Visited token: <<COMMA>: "," at line 3 column 12>; Expected token: <"-">
Visited token: <<COMMA>: "," at line 3 column 12>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 3 column 12>
Consumed token: <<NUM>: "13" at line 3 column 13>
Visited token: <"-" at line 3 column 15>; Expected token: <<WHITE>>
Visited token: <"-" at line 3 column 15>; Expected token: <"-">
Visited token: <<NUM>: "13" at line 3 column 16>; Expected token: <<WHITE>>
Visited token: <<NUM>: "13" at line 3 column 16>; Expected token: <<NUM>>
Consumed token: <"-" at line 3 column 15>
Consumed token: <<NUM>: "13" at line 3 column 16>
Visited token: <<COMMA>: "," at line 3 column 18>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 3 column 18>
Visited token: <<WHITE>: " " at line 3 column 19>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 3 column 19>
Visited token: <<WHITE>: " " at line 3 column 20>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 3 column 20>
Consumed token: <<NUM>: "14" at line 3 column 21>
Visited token: <<WHITE>: " " at line 3 column 23>; Expected token: <<WHITE>>
Visited token: <<WHITE>: " " at line 3 column 24>; Expected token: <<WHITE>>
Consumed token: <<WHITE>: " " at line 3 column 23>
Consumed token: <<WHITE>: " " at line 3 column 24>
Consumed token: <"-" at line 3 column 25>
Consumed token: <<WHITE>: " " at line 3 column 26>
Consumed token: <<WHITE>: " " at line 3 column 27>
Consumed token: <<WHITE>: " " at line 3 column 28>
Consumed token: <<WHITE>: " " at line 3 column 29>
Consumed token: <<NUM>: "14" at line 3 column 30>
Visited token: <<WHITE>: " " at line 3 column 32>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 3 column 32>
Consumed token: <<NUM>: "15" at line 3 column 33>
Visited token: <"-" at line 3 column 35>; Expected token: <<WHITE>>
Visited token: <"-" at line 3 column 35>; Expected token: <"-">
Visited token: <<WHITE>: " " at line 3 column 36>; Expected token: <<WHITE>>
Consumed token: <"-" at line 3 column 35>
Consumed token: <<WHITE>: " " at line 3 column 36>
Consumed token: <<NUM>: "15" at line 3 column 37>
Visited token: <<WHITE>: " " at line 3 column 39>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 3 column 39>
Consumed token: <<NUM>: "16" at line 3 column 40>
Visited token: <<WHITE>: " " at line 3 column 42>; Expected token: <<WHITE>>
Visited token: <"-" at line 3 column 43>; Expected token: <<WHITE>>
Visited token: <"-" at line 3 column 43>; Expected token: <"-">
Consumed token: <<WHITE>: " " at line 3 column 42>
Consumed token: <"-" at line 3 column 43>
Consumed token: <<NUM>: "16" at line 3 column 44>
Visited token: <<WHITE>: " " at line 3 column 46>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 3 column 46>
Visited token: <<EOL>: "\n" at line 3 column 47>; Expected token: <<EOL>>
Consumed token: <<EOL>: "\n" at line 3 column 47>
Consumed token: <<NUM>: "17" at line 4 column 1>
Visited token: <"-" at line 4 column 3>; Expected token: <<WHITE>>
Visited token: <"-" at line 4 column 3>; Expected token: <"-">
Visited token: <<NUM>: "17" at line 4 column 4>; Expected token: <<WHITE>>
Visited token: <<NUM>: "17" at line 4 column 4>; Expected token: <<NUM>>
Consumed token: <"-" at line 4 column 3>
Consumed token: <<NUM>: "17" at line 4 column 4>
Visited token: <<EOL>: "\n" at line 4 column 6>; Expected token: <<EOL>>
Consumed token: <<EOL>: "\n" at line 4 column 6>
Visited token: <<WHITE>: " " at line 5 column 1>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 5 column 1>
Consumed token: <<NUM>: "18" at line 5 column 2>
Visited token: <<WHITE>: " " at line 5 column 4>; Expected token: <<WHITE>>
Visited token: <"-" at line 5 column 5>; Expected token: <<WHITE>>
Visited token: <"-" at line 5 column 5>; Expected token: <"-">
Consumed token: <<WHITE>: " " at line 5 column 4>
Consumed token: <"-" at line 5 column 5>
Consumed token: <<WHITE>: " " at line 5 column 6>
Consumed token: <<NUM>: "18" at line 5 column 7>
Visited token: <<EOL>: "\n" at line 5 column 9>; Expected token: <<EOL>>
Consumed token: <<EOL>: "\n" at line 5 column 9>
Visited token: <<WHITE>: " " at line 6 column 1>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 6 column 1>
Consumed token: <<NUM>: "19" at line 6 column 2>
Visited token: <<WHITE>: " " at line 6 column 4>; Expected token: <<WHITE>>
Visited token: <"-" at line 6 column 5>; Expected token: <<WHITE>>
Visited token: <"-" at line 6 column 5>; Expected token: <"-">
Consumed token: <<WHITE>: " " at line 6 column 4>
Consumed token: <"-" at line 6 column 5>
Consumed token: <<WHITE>: " " at line 6 column 6>
Consumed token: <<NUM>: "19" at line 6 column 7>
Visited token: <<WHITE>: " " at line 6 column 9>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 6 column 9>
Visited token: <<EOL>: "\n" at line 6 column 10>; Expected token: <<EOL>>
Consumed token: <<EOL>: "\n" at line 6 column 10>
Visited token: <<WHITE>: " " at line 7 column 1>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 7 column 1>
Consumed token: <<NAME>: "GROUP_1_A" at line 7 column 2>
Visited token: <<SEMICOLON>: ";" at line 7 column 20>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 7 column 20>
Consumed token: <<NAME>: "GROUP_1_A" at line 7 column 21>
Visited token: <<WHITE>: " " at line 7 column 39>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 7 column 39>
Consumed token: <<NAME>: "GROUP_1_A" at line 7 column 40>
Visited token: <<SEMICOLON>: ";" at line 7 column 58>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 7 column 58>
Consumed token: <<NAME>: "GROUP_1_A" at line 7 column 59>
Visited token: <<COMMA>: "," at line 7 column 77>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 7 column 77>
Consumed token: <<NAME>: "GROUP_1_A" at line 7 column 78>
Visited token: <<WHITE>: " " at line 7 column 96>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 7 column 96>
Visited token: <<WHITE>: " " at line 7 column 97>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 7 column 97>
Visited token: <<COMMA>: "," at line 7 column 98>; Expected token: <<EOL>>
Consumed token: <<COMMA>: "," at line 7 column 98>
Visited token: <<SEMICOLON>: ";" at line 7 column 99>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 7 column 99>
Visited token: <<SEMICOLON>: ";" at line 7 column 100>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 7 column 100>
Visited token: <<EOL>: "\n" at line 7 column 101>; Expected token: <<EOL>>
Consumed token: <<EOL>: "\n" at line 7 column 101>
Visited token: <<EOL>: "\n" at line 8 column 1>; Expected token: <<EOL>>
Consumed token: <<EOL>: "\n" at line 8 column 1>
Visited token: <<WHITE>: " " at line 9 column 1>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 9 column 1>
Visited token: <<WHITE>: " " at line 9 column 2>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 9 column 2>
Visited token: <<WHITE>: " " at line 9 column 3>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 9 column 3>
Consumed token: <<NAME>: "GROUP_1_A" at line 9 column 5>
Visited token: <<WHITE>: " " at line 9 column 24>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 9 column 24>
Visited token: <<WHITE>: " " at line 9 column 25>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 9 column 25>
Visited token: <<SEMICOLON>: ";" at line 9 column 26>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 9 column 26>
Visited token: <<SEMICOLON>: ";" at line 9 column 27>; Expected token: <<EOL>>
Consumed token: <<SEMICOLON>: ";" at line 9 column 27>
Visited token: <<WHITE>: " " at line 9 column 28>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 9 column 28>
Visited token: <<WHITE>: " " at line 9 column 29>; Expected token: <<EOL>>
Consumed token: <<WHITE>: " " at line 9 column 29>
Consumed token: <<NUM>: "20" at line 9 column 30>
Visited token: <<WHITE>: " " at line 9 column 32>; Expected token: <<WHITE>>
Visited token: <<WHITE>: " " at line 9 column 33>; Expected token: <<WHITE>>
Consumed token: <<WHITE>: " " at line 9 column 32>
Consumed token: <<WHITE>: " " at line 9 column 33>
Return: parse
parsers.excel.ParseException: Encountered " <NUM> "1 "" at line 9, column 34.
Was expecting one of:
<WHITE> ...
"-" ...
解析器应产生类似的输出:
single = [1,2,3,4,5,6,7,8,9,10,11,12,20]
range = [13 - 13,14 - 14,15 - 15,16 - 16,17 - 17,18 - 18,19 - 19]
named = [GROUP_1_A,GROUP_1_A,GROUP_1_A,GROUP_1_A,GROUP_1_A,GROUP_1_A]
当解析器不知道空格是来自破折号之前的空格还是分隔整数的空格时,就会出现问题。
如果您知道通过任何方式修改JavaCC来正确解析提供的字符串,将不胜感激。
答案 0 :(得分:2)
让我们从JavaCC退后一步,看看您的语法实际上是什么。
parse --> ows ( body )+
body --> part sep
part --> <NAME>
part --> <NUM>
part --> <NUM> ows "-" ows <NUM>
sep --> (<EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+
sep --> EOF
ows --> (<WHITE>)*
您应该检查一遍,以确保(a)我没有犯任何错误,并且(b)这确实是您想要的语言。
我不喜欢您处理EOF
的方式。它实际上不是分隔符。我建议使用以下语法,该语法实际上是相同的
parse --> ows body
body --> part ( sep body | <EOF> )
part --> <NAME>
part --> <NUM>
part --> <NUM> ows "-" ows <NUM>
sep --> (<EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+
ows --> (<WHITE>)*
第一个解决方案:语法超前
OP表示如果有一种方法可以在不消耗下一个令牌的情况下进行检查,那将很容易。有。称为语法先行。
我们唯一需要向前看的地方是区分part
的第二和第三作品。
让我们结合起来。
part --> <NAME>
part --> <NUM> ( ows "-" ows <NUM> )?
没有固定长度的提前确定在第二个生产中是否采用可选路径。因此,我们像这样使用语法先行:
part --> <NAME>
part --> <NUM> ( LOOKAHEAD( ows "-" ) ows "-" ows <NUM> )?
现在,我们完成了。让我们将生产返回到JavaCC
void parse() : { }
{
ows() body }
}
void body() : { }
{
part() ( sep() body() | <EOF> )
}
void part() : { }
{
<NAME>
|
<NUM>
( LOOKAHEAD( ows() "-")
ows() "-" ows() <NUM>
)?
}
void sep() : {}
{
(<EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+
}
void ows() : {}
{
(<WHITE>)*
}
第二个解决方案:LL(1)
我们可以用LL(1)语法解决它吗?是。让我们回到原始语法,或更确切地说,是将EOF
带出循环的语法。
parse --> ows body
body --> part (sep body | <EOF>)
part --> <NAME>
part --> <NUM> ( ows "-" ows <NUM> )?
sep --> (<EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+
ows --> (<WHITE>)*
内联part
并引入非终端afternum
parse --> ows body
body --> <NAME> (sep body | <EOF>)
body --> <NUM> afternum
afternum --> ( ows "-" ows <NUM> )? (sep body | <EOF>)
sep --> (<EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+
ows --> (<WHITE>)*
现在问题出在afternum
。
当我们开始解析afternum
时,有5种可能性可供考虑。 (i)下一个标记是"-"
。 (ii)下一个标记是EOL
,COMMA
或SEMICOLON
。 (iii)下一个标记是空白。 (iv)下一个标记是EOF
。 (v)在任何其他情况下,我们都会出错。
在情况(ii)中,这不能是最后一部分。在情况(iii)中,我们刚刚看到的WHITE可能是sep
的第一个字符,或者可能导致连字符。我们制作了一个新的非终端设备来处理这两种可能性。
afternum --> "-" ows <NUM> (sep body | <EOF>)
afternum --> nonwssep (sep)? body
afternum --> <WHITE> moreafternum
afternum --> EOF
moreafternum --> ows "-" ows <NUM> (sep body | EOF)
| sep? body
nonwssep --> <EOL> | <COMMA> | <SEMICOLON>
现在问题出在moreafternum
中,因为如果下一个令牌是WHITE
,则任何一种选择都是可行的。
让我们稍微操纵moreafternum
。目标是公开WHITE
令牌,以便我们将其排除在外。
moreafternum
= By definition
ows "-" ows <NUM> (sep body | EOF) | sep? body
= Expand the ?
ows "-" ows <NUM> (sep body | EOF)
| body
| sep body
= Expand first `ows` and split white from other cases
"-" ows <NUM> (sep body | EOF)
| WHITE ows "-" ows <NUM> (sep body | EOF)
| body
| sep body
= Expand the `sep` in the fourth case
"-" ows <NUM> (sep body | EOF)
| WHITE ows "-" ows <NUM> (sep body | EOF)
| body
| (WHITE | nonwesep) sep? body
= Split the fourth case
"-" ows <NUM> (sep body | EOF)
| WHITE ows "-" ows <NUM> (sep body | EOF)
| body
| WHITE sep? body
| nonwssep sep? body
= Duplicate the fourth choice
"-" ows <NUM> (sep body | EOF)
| WHITE ows "-" ows <NUM> (sep body | EOF)
| WHITE sep? body
| body
| WHITE sep? body
| nonwssep sep?
= Combine the second and third choices.
"-" ows <NUM> (sep body | EOF)
| WHITE ( ows "-" ows <NUM> (sep body | EOF) | sep? body )
| body
| WHITE sep? body
| nonwssep sep? body
= combine the third, fourth, and fifth choices
"-" ows <NUM> (sep body | EOF)
| WHITE ( ows "-" ows <NUM> (sep body | EOF) | sep? body)
| sep? body
= Definition of moreafternum
"-" ows <NUM> (sep body | EOF)
| WHITE moreafternum
| sep? body
现在我们可以使用此递归版本重新定义moreafternum
moreafternum --> "-" ows <NUM> (sep body | EOF)
| <WHITE> moreafternum
| sep? body
如果我们使用JavaCC编写此产品,则当下一个标记为WHITE时,第二个选择和第三个选择之间仍然存在选择冲突。 JavaCC将比第二更喜欢第二。如果您不喜欢该警告,则可以使用LOOKAHEAD来禁止它。请注意,此LOOKAHEAD不会更改生成的Java代码,它只是消除警告。
void moreafternum() : {} {
"-" ows() <NUM> (sep() body() | <EOF>)
|
// LOOKAHEAD( <WHITE> ) // Optional lookahead to suppresss the warning
<WHITE> moreafternum()
|
( sep() )? body() }
通过再次查看moreafternum
,我们可以一直到LL(1)。
moreafternum
= From above
"-" ows <NUM> (sep body | EOF)
| WHITE ( ows "-" ows <NUM> (sep body | EOF) | sep? body)
| body
| WHITE sep? body
| nonwssep sep? body
= Fourth choice is subsumed by the second.
"-" ows <NUM> (sep body | EOF)
| WHITE ( ows "-" ows <NUM> (sep body | EOF) | sep? body)
| body
| nonwssep sep? body
= Combine last two choices
"-" ows <NUM> (sep body | EOF)
| WHITE ( ows "-" ows <NUM> (sep body | EOF) | sep? body)
| (nonwssep sep?)? body
= Original definition of moreaftersep
"-" ows <NUM> (sep body | EOF)
| WHITE moreaftersep
| (nonwssep sep?)? body
全部放入
parse --> ows body
body --> <NAME> (sep body | <EOF>)
body --> <NUM> afternum
afternum --> "-" ows <NUM> (sep body | <EOF>)
afternum --> <WHITE> moreafternum
afternum --> nonwssep (sep)? body
afternum --> EOF
moreafternum --> "-" ows <NUM> (sep body | EOF)
moreafternum --> <WHITE> moreafternum
moreafternum --> ( nonwssep (sep)? )? body
nonwssep --> <EOL> | <COMMA> | <SEMICOLON>
sep --> (nonwssep | <WHITE>)+
ows --> (<WHITE>)*
这是LL(1),因此您无需提前就可以将其转换为JavaCC。