我有以下输入
@Book{press,
author = "Press, W. and Teutolsky, S. and Vetterling, W. and Flannery B.",
title = "Numerical {R}ecipes in {C}: The {A}rt of {S}cientific {C}omputing",
year = 2007,
publisher = "Cambridge University Press"
}
我必须为RecDescent解析器生成器编写语法。 应该为xml结构修改输出数据,如下所示:
<book>
<keyword>press</keyword>
<author>Press, W.+Teutolsky, S.+Vetterling, W.+Flannery B.</author>
<title>Numerical {R}ecipes in {C}: The {A}rt of {S}cientific {C}omputing</title>
<year>2007</year>
<publisher>Cambridge University Press</publisher>
</book>
应将附加和重复字段报告为错误(带有行号的正确消息,不再进行解析)。我试着从这样的事情开始:
use Parse::RecDescent;
open(my $in, "<", "parsing.txt") or die "Can't open parsing.txt: $!";
my $text;
while (<$in>) {
$text .= $_;
}
print $text;
my $grammar = q {
beginning: "\@Book\{" keyword fields "\}" { print "<book>\n",$item[2],$item[3],"</book>"; }
keyword: /[a-zA-Z]+/ "," { return " <keyword>".$item[1]."</keyword>\n"; }
fields: one "," two "," tree "," four { return $item[1].$item[3].$item[5].$item[7]; }
one: "author" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\"" { $item[4] =~ s/\sand\s/\+/g;
return " <author>",$item[4],"</author>\n"; }
two: "title" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\"" { $item[4] =~ s/\sand\s/\+/g;
return " <title>",$item[4],"</title>\n"; }
three: "year" "=" /[0-2][0-9][0-9][0-9]/ { return " <year>",$item[3],"</year>\n"; }
four: "publisher" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\""
{ $item[4] =~ s/\sand\s/\+/g;
return " <publisher>",$item[4],"</publisher>\n"; }
};
my $parser = new Parse::RecDescent($grammar) or die ("Bad grammar!");
defined $parser->beginning($text) or die ("Bad text!");
但我甚至不知道这是否是正确的做法。请帮忙。
还有一个小问题。输入处的标签可能不是特定顺序,但每个标签只能出现一次。我是否必须为(作者,标题,年份,出版商)的所有排列编写子规则?因为我提出了:
fields: field "," field "," field "," field
field: one | two | three | four
但显然不能阻止重复标记。
答案 0 :(得分:9)
首先,您有一个拼写错误:tree
而不是three
。
我运行了你的程序,但添加了一行:
use strict;
use warnings; # you should always have strict and warnings on
$::RD_HINT = 1; # Parse::RecDescent hints
$::RD_TRACE = 1; # Parse::RecDescent trace
并得到了这个调试输出:
1|beginning |>>Matched terminal<< (return value: |
| |[@Book{]) |
1|beginning | |"press,\n author = "Press,
| | |W. and Teutolsky, S. and
| | |Vetterling, W. and Flannery
| | |B.",\n title = "Numerical
| | |{R}ecipes in {C}: The {A}rt
| | |of {S}cientific
| | |{C}omputing",\n year =
| | |2007,\n publisher =
| | |"Cambridge University
| | |Press"\n}\n"
1|beginning |Trying subrule: [keyword] |
2| keyword |Trying rule: [keyword] |
2| keyword |Trying production: [/[a-zA-Z]+/ ','] |
2| keyword |Trying terminal: [/[a-zA-Z]+/] |
2| keyword |>>Matched terminal<< (return value: |
| |[press]) |
2| keyword | |",\n author = "Press, W. and
| | |Teutolsky, S. and
| | |Vetterling, W. and Flannery
| | |B.",\n title = "Numerical
| | |{R}ecipes in {C}: The {A}rt
| | |of {S}cientific
| | |{C}omputing",\n year =
| | |2007,\n publisher =
| | |"Cambridge University
| | |Press"\n}\n"
2| keyword |Trying terminal: [','] |
2| keyword |>>Matched terminal<< (return value: |
| |[,]) |
2| keyword | |"\n author = "Press, W. and
| | |Teutolsky, S. and
| | |Vetterling, W. and Flannery
| | |B.",\n title = "Numerical
| | |{R}ecipes in {C}: The {A}rt
| | |of {S}cientific
| | |{C}omputing",\n year =
| | |2007,\n publisher =
| | |"Cambridge University
| | |Press"\n}\n"
2| keyword |Trying action |
1|beginning |>>Matched subrule: [keyword]<< (return|
| |value: [ <keyword>press</keyword> ]|
1|beginning | |"press,\n author = "Press,
| | |W. and Teutolsky, S. and
| | |Vetterling, W. and Flannery
| | |B.",\n title = "Numerical
| | |{R}ecipes in {C}: The {A}rt
| | |of {S}cientific
| | |{C}omputing",\n year =
| | |2007,\n publisher =
| | |"Cambridge University
| | |Press"\n}\n"
1|beginning |Trying subrule: [fields] |
2| fields |Trying rule: [fields] |
2| fields |Trying production: [one ',' two ',' |
| |three ',' four] |
2| fields |Trying subrule: [one] |
3| one |Trying rule: [one] |
3| one |Trying production: ['author' '=' '\"' |
| |/[a-zA-Z\s\.\,{}\:]+/ '\"'] |
3| one |Trying terminal: ['author'] |
3| one |<<Didn't match terminal>> |
3| one |<<Didn't match rule>> |
2| fields |<<Didn't match subrule: [one]>> |
2| fields |<<Didn't match rule>> |
1|beginning |<<Didn't match subrule: [fields]>> |
1|beginning |<<Didn't match rule>> |
Bad text! at parser.pl line 32, <$in> line 6.
这表明它已经陷入了子句one
,并且press,
被放回到输入流中。这是因为您使用return
而不是$return =
作为Parse :: RecDescent手册says you should。
此外,一旦分配到$return
变量,就不能再返回列表,并且必须手动将字符串连接在一起。
这是最终结果:
use strict;
use warnings;
use Parse::RecDescent;
open(my $in, "<", "parsing.txt") or die "Can't open parsing.txt: $!";
my $text;
while (<$in>) {
$text .= $_;
}
print $text;
my $grammar = q {
beginning: "\@Book\{" keyword fields /\s*\}\s*/ { print "<book>\n",$item[2],$item[3],"</book>"; }
keyword: /[a-zA-Z]+/ "," { $return = " <keyword>$item[1]</keyword>\n"; }
fields: one /,\s*/ two /,\s*/ three /,\s*/ four { $return = $item[1].$item[3].$item[5].$item[7]; }
one: "author" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\"" { $item[4] =~ s/\sand\s/\+/g;
$return = " <author>$item[4]</author>\n"; }
two: "title" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\"" { $item[4] =~ s/\sand\s/\+/g;
$return = " <title>$item[4]</title>\n"; }
three: "year" "=" /[0-2][0-9][0-9][0-9]/ { $return = " <year>$item[3]</year>\n"; }
four: "publisher" "=" "\"" /[a-zA-Z\s\.\,\{\}\:]+/ "\""
{ $item[4] =~ s/\sand\s/\+/g;
$return = " <publisher>$item[4]</publisher>\n"; }
};
my $parser = new Parse::RecDescent($grammar) or die ("Bad grammar!");
defined $parser->beginning($text) or die ("Bad text!");