正则表达式语法解析器结束字符串错误

时间:2018-05-09 20:44:05

标签: regex parsing grammar

我正在编写语法来编译.abc文件。这些是文本文件,其中每行文本都是音乐语音(播放一些音符的乐器)。在我的语法中,我利用文本的逐行结构来解析当时的一行。简化的语法看起来像这样。

// Body

// spaces and tabs have explicit meaning in the body, don't automatically ignore them

abc_body ::= abc_line+;
abc_line ::= element+ end_of_line (lyric end_of_line)?  | middle_of_body_field | comment;
element ::= note_element | rest_element | tuplet_element | barline | nth_repeat | space_or_tab; 

// notes
note_element ::= note | chord;

note ::= pitch note_length?;
pitch ::= accidental? basenote octave?;
octave ::= "'"+ | ","+;
note_length ::= (digit+)? ("/" (digit+)?)?;
note_length_strict ::= digit+ "/" digit+;

// "^" is sharp, "_" is flat, and "=" is neutral
accidental ::= "^" | "^^" | "_" | "__" | "=";

basenote ::= "C" | "D" | "E" | "F" | "G" | "A" | "B" | "c" | "d" | "e" | "f" | "g" | "a" | "b";

// rests
rest_element ::= "z" note_length?;

// tuplets
tuplet_element ::= tuplet_spec note_element+;
tuplet_spec ::= "(" digit ;

// chords
chord ::= "[" note+ "]";

barline ::= "|" | "||" | "[|" | "|]" | ":|" | "|:";
nth_repeat ::= "[1" | "[2";

// A voice field might reappear in the middle of a piece
// to indicate the change of a voice
middle_of_body_field ::= field_voice;

lyric ::= "w:" lyrical_element*;
lyrical_element ::= " "+ | "-" | "_" | "*" | "~" | backslash_hyphen | "|" | lyric_text;
// lyric_text should be defined appropriately
lyric_text ::= [.]*;

backslash_hyphen ::= "\\" "-";
//backslash immediately followed by hyphen

// General

comment ::= space_or_tab* "%" comment_text newline;
//comment_text should be defined appropriately
comment_text ::= [.]*;

end_of_line ::= newline | comment;

digit ::= [0-9];
newline ::= "\n" | "\r" "\n"?;
space_or_tab ::= " " | "\t";

text ::= .*;

但我对这种方法有疑问。对于任何有效的.abc文件,我得到文件最后一行的错误。解析器尝试匹配end_of_line但遇到字符串的结尾。这意味着在最后一次之后需要有新线。有任何建议或解决这个问题?

1 个答案:

答案 0 :(得分:0)

一种方法是重构语法(即,不改变语言),将最终的newlineabc_line中分解出来,即可以写出

abc_line ::= abc_line_content newline

然后,改变:

abc_body ::= abc_line+

为:

abc_body ::= abc_line_content (newline abc_line_content)*

(必要时附加newline?,即如果某些文件在最后一行的末尾有换行符。)