输入文本文件包含以下内容:
....
ponies B-pro
were I-pro
used I-pro
A O
report O
of O
indirect B-cd
were O
. O
...
输出XML文件
<sen>
<base id="pro">
<w id="1">ponies</w>
<w id="2">were</w>
<w id="3">were</w>
</base>A report of
<base id="cd">indirect</base> were
</sen>
我想通过阅读文本文件来创建XML文件,B-意味着我的标签的开头,I-意味着在标签内包含单词,而“O”表示在基本标签之外,这意味着它只存在于标签
我尝试以下代码:
#!/usr/local/bin/perl -w
open(my $f, "input.txt") or die "Can't";
open(my $o, ">output.xml") or die "Can't";
my $c;
sub read_line {
my $fh = shift;
if ($fh and my $line = <$fh>) {
chomp($line);
my @words = split(/\t/, $line);
my $word = $words[0];
my $group = $words[1];
if($word eq "."){
return;
}
else{
if($group ne 'O'){
my @b = split(/\-/, $group);
if($b[0] eq 'B'){
my $e = "<e id=\"";
$e .= " . $b[1] . "\">";
$e .= $word . "</e>";
return $e;
}
if($b[0] eq 'I'){
my $w = "<w id=\"";
$w .= $c . "\">";
$w .= $word . "</w>";
$c++;
return $w;
}
}
else{
$c = 2;
return $word;
}
}
}
return;
}
sub get_text(){
my $txt = "";
my $r = read_line($f);
while($r){
if($r =~ m/[[:punct:]]/){
chop($txt);
$txt .= " " . $r . " ";
}
else{
$txt .= $r . " ";
}
$r = read_line($f);
}
chop($txt);
return "<sen>" . $txt . ".</sen>";
}
而是我得到输出:
<sen>
<base id="pro"> ponies </base>
<w id="2">were</w>
<w id="3">were</w>
A report of
<base id="cd">indirect</base> were
</sen>
我真的需要帮助。
由于
答案 0 :(得分:1)
手动编写XML只会让你遇到麻烦。使用CPAN中的模块。
在你的情况下,我首先将数据放入适当的Perl数据结构(可能是包含一些数组的哈希,或类似的东西)然后使用模块(即XML :: Simple作为启动程序)输出到文件
答案 1 :(得分:1)
正如Javs所说,你想要使用模块而不是手工完成。出于您的目的,由于您有混合内容,我建议XML::LibXML。这是一个我测试的例子,你确实可以像你一样混合内容:
use XML::LibXML;
my $doc = XML::LibXML::Document->new();
my $root = $doc->createElement('html');
$doc->setDocumentElement($root);
my $body = $doc->createElement('body');
$root->appendChild($body);
my $link = $doc->createElement('a');
$link->setAttribute('href', 'http://google.com');
$link->appendText('Google');
$body->appendChild($link);
$body->appendText('Inline Text');
print $doc->toString;