我正在创建一个词法分析器/解析器,它应该接受属于无限语言的字符串
一个这样的字符串是"a <2L>AA <2U>a <2L>AA <2U>a</2U></2L></2U></2L>"
。
语言集定义如下:
基础语言,L0
属于L0的字符串示例:
zyx abcba m xyzvv
zyx
和abcba
之间有一个空格字符,有三个空格
在abcba
和m
之间,m
和xyzvv
之间只有一个。字符串中不存在其他空格字符。
语言L1
<2U>. . .</2U>
形状,其中. . .
代表
来自L0的任何字符串。属于L1的字符串示例:
YZ <2U>abc zzz</2U> ABBA <2U>kkkkk</2U> KM
请注意,五个空格分隔YZ
和<2U>abc zzz</2U>
,三个空格将abc
与zzz
分开。否则单个空格用作分隔符。 YZ
前面没有空格,KM
后面没有空格。
语言L2
<2L>. . .</2L>
形状,其中. . .
代表
来自L1的任何字符串。属于L2的字符串示例:
abc <2L>AA ZZ <2U>a bcd</2U></2L> z <2L><2U>abcde</2U></2L>
单个空格在上面给出的句子中用作分隔符,但任何其他奇数个空格也会导致有效的L2句子。
语言L {2k + 1},k&gt; 0
<2U>. . .</2U>
形状,其中. . .
代表
来自L {2k}的任何字符串。语言L {2k + 2},k&gt; 0
<2L>. . .</2L>
形状,其中. . .
代表
来自L {2k + 1}的任何字符串。我的词法分析器/解析器的代码如下:
PARSER_BEGIN(Assignment)
/** A parser which determines if user's input belongs to any one of the set of acceptable languages. */
public class Assignment {
public static void main(String[] args) {
try {
Assignment parser = new Assignment(System.in);
parser.Start();
System.out.println("YES"); // If the user's input belongs to any of the set of acceptable languages, then print YES.
} catch (ParseException e) {
System.out.println("NO"); // If the user's input does not belong to any of the set of acceptable languages, then print NO.
}
}
}
PARSER_END(Assignment)
//** A token which matches any lowercase letter from the English alphabet. */
TOKEN :
{
< #L_CASE_LETTER: ["a"-"z"] >
}
//* A token which matches any uppercase letter from the English alphabet. */
TOKEN:
{
< #U_CASE_LETTER: ["A"-"Z"] >
}
//** A token which matches an odd number of lowercase letters from the English alphabet. */
TOKEN:
{
< ODD_L_CASE_LETTER: <L_CASE_LETTER>(<L_CASE_LETTER><L_CASE_LETTER>)* >
}
//** A token which matches an even number of uppercase letters from the English alphabet. */
TOKEN:
{
< EVEN_U_CASE_LETTERS: (<U_CASE_LETTER><U_CASE_LETTER>)+ >
}
//* A token which matches the string "<2U>" . */
TOKEN:
{
< OPEN_UPPER: "<2U>" >
}
//* A token which matches the string "</2U>". */
TOKEN:
{
< CLOSE_UPPER: "</2U>" >
}
//* A token which matches the string "<2L>". */
TOKEN:
{
< OPEN_LOWER: "<2L>" >
}
//* A token which matches the string "</2L>". */
TOKEN:
{
< CLOSE_LOWER: "</2L>" >
}
//* A token which matches an odd number of white spaces. */
TOKEN :
{
< ODD_WHITE_SPACE: " "(" "" ")* >
}
//* A token which matches an EOL character. */
TOKEN:
{
< EOL: "\n" | "\r" | "\r\n" >
}
/** This production matches strings which belong to the base language L^0. */
void Start() :
{}
{
LOOKAHEAD(3)
<ODD_L_CASE_LETTER> (<ODD_WHITE_SPACE> <ODD_L_CASE_LETTER>)* <EOL> <EOF>
|
NextLanguage()
|
LOOKAHEAD(3)
NextLanguageTwo()
|
EvenLanguage()
}
/** This production matches strings which belong to language L^1. */
void NextLanguage():
{}
{
(<OPEN_UPPER> (PseudoStart()) <CLOSE_UPPER>)+ (<ODD_WHITE_SPACE> UpperOrPseudoStart())* <EOL> <EOF>
|
(<EVEN_U_CASE_LETTERS>)+ (<ODD_WHITE_SPACE> UpperOrPseudoStart())* <EOL> <EOF>
}
/** This production matches either an even number of uppercase letters, or a string from L^0, encased within the tags <2U> and </2U>. */
void UpperOrPseudoStart() :
{}
{
<EVEN_U_CASE_LETTERS>
|
<OPEN_UPPER> (PseudoStart()) <CLOSE_UPPER>
}
/** This production matches strings from L^0, in a similar way to Start(); however, the strings that it matches do not have EOL or EOF characters after them. */
void PseudoStart() :
{}
{
<ODD_L_CASE_LETTER> (<ODD_WHITE_SPACE> <ODD_L_CASE_LETTER>)*
}
/** This production matches strings which belong to language L^2. */
void NextLanguageTwo() :
{}
{
(<ODD_L_CASE_LETTER>)+ (<ODD_WHITE_SPACE> LowerOrPseudoNextLanguage())* <EOL> <EOF>
|
(<OPEN_LOWER> PseudoNextLanguage() <CLOSE_LOWER>)+ (<ODD_WHITE_SPACE> LowerOrPseudoNextLanguage())* <EOL> <EOF>
}
/** This production matches either an odd number of lowercase letters, or a string from L^1, encased within the tags <2L> and </2L>. */
void LowerOrPseudoNextLanguage() :
{}
{
<ODD_L_CASE_LETTER>
|
<OPEN_LOWER> PseudoNextLanguage() <CLOSE_LOWER>
}
/** This production matches strings from L^1, in a similar way to NextLanguage(); however, the strings that it matches do not have EOL or EOF characters after them. */
void PseudoNextLanguage() :
{}
{
(<OPEN_UPPER> (PseudoStart()) <CLOSE_UPPER>)+ (<ODD_WHITE_SPACE> UpperOrPseudoStart())*
|
(<EVEN_U_CASE_LETTERS>)+ (<ODD_WHITE_SPACE> UpperOrPseudoStart())*
}
/** This production matches strings which belong to any of the languages L^{2k + 2}, where k > 0 (the infinite set of even languages). */
void EvenLanguage() :
{}
{
(<ODD_L_CASE_LETTER>)+ (<ODD_WHITE_SPACE> EvenLanguageAuxiliary())* <EOL> <EOF>
|
(CommonPattern())+ (<ODD_WHITE_SPACE> EvenLanguageAuxiliary())* <EOL> <EOF>
}
/** This production is an auxiliary production that helps when parsing strings from any of the even set of languages. */
void EvenLanguageAuxiliary() :
{}
{
CommonPattern()
|
<ODD_L_CASE_LETTER>
}
void CommonPattern() :
{}
{
<OPEN_LOWER> <EVEN_U_CASE_LETTERS> <ODD_WHITE_SPACE> <OPEN_UPPER> <ODD_L_CASE_LETTER> (<ODD_WHITE_SPACE> CommonPattern())+ <CLOSE_UPPER> <CLOSE_LOWER>
}
现在好几次,我输入了字符串"a <2L>AA <2U>a <2L>AA <2U>a</2U></2L></2U></2L>"
但是,每次在终端上打印NO
我仔细查看了我的代码几次,检查了我认为应该解析输入字符串的顺序;但是,我无法在逻辑中找到任何错误或为什么不接受字符串的原因。
我可以提出一些关于为什么不被接受的建议吗?
答案 0 :(得分:1)
以下步骤有助于解决问题。
javacc -debug_parser Assignment.jj
javac Assignment*.java
java Assignment
)然后输入字符串:"a <2L>AA <2U>a <2L>AA <2U>a</2U></2L></2U></2L>"
NextLangaugeTwo()
,而不是所需的EvenLanguage()
生产。 NextLangaugeTwo()
表示它与输入字符串中的前八个标记相匹配。Start()
的调用上方)从3更改为9来修改NextLanguageTwo()
生产。答案 1 :(得分:0)
您的任何输入是否被接受?我已将代码复制到我的计算机上并发现任何正确的输入(据我所知,从您的语言定义),它总是输出'NO'。