如何使用perl将文本转换为XML?

时间:2010-12-06 23:04:53

标签: perl

输入文本文件包含以下内容:

....    
    ponies B-pro        
    were I-pro        
    used I-pro    
    A O        
    report O        
    of O    
    indirect B-cd        
    were O
    . O    
...

输出XML文件

<sen> 
 <base id="pro">
  <w id="1">ponies</w>
  <w id="2">were</w>
  <w id="3">were</w>
 </base>A report of 
 <base id="cd">indirect</base> were 
</sen>

我想通过阅读文本文件来创建XML文件,B-意味着我的标签的开头,I-意味着在标签内包含单词,而“O”表示在基本标签之外,这意味着它只存在于标签

我尝试以下代码:

#!/usr/local/bin/perl -w    
open(my $f, "input.txt") or die "Can't";    
open(my $o, ">output.xml") or die "Can't";    
my $c;   

sub read_line {     
  my $fh = shift;    
  if ($fh and my $line = <$fh>) {    
    chomp($line);    
 my @words = split(/\t/, $line);    
 my $word = $words[0];
     my $group = $words[1];    
 if($word eq "."){    
  return;    
 }    
 else{    
  if($group ne 'O'){    
   my @b = split(/\-/, $group);    
   if($b[0] eq 'B'){    
    my $e = "<e id=\"";              
    $e .= " . $b[1] . "\">";    
    $e .= $word . "</e>";
    return $e;    
   }   
   if($b[0] eq 'I'){    
    my $w = "<w id=\"";    
    $w .= $c . "\">";    
    $w .= $word . "</w>";    
    $c++;    
    return $w;    
   }    
  }    
  else{    
   $c = 2;    
   return $word;    
  }    
 }    
  }    
  return;    
}

sub get_text(){    
 my $txt = "";    
 my $r = read_line($f);     
 while($r){     
  if($r =~ m/[[:punct:]]/){    
   chop($txt);    
   $txt .= " " . $r . " ";    
  }    
  else{    
   $txt .= $r . " ";    
  }    
  $r = read_line($f);    
 }   
 chop($txt);    
 return "<sen>" . $txt . ".</sen>";    
}

而是我得到输出:

<sen> 
 <base id="pro"> ponies </base>
  <w id="2">were</w>
  <w id="3">were</w>
 A report of 
 <base id="cd">indirect</base> were 
</sen>

我真的需要帮助。

由于

2 个答案:

答案 0 :(得分:1)

手动编写XML只会让你遇到麻烦。使用CPAN中的模块。

在你的情况下,我首先将数据放入适当的Perl数据结构(可能是包含一些数组的哈希,或类似的东西)然后使用模块(即XML :: Simple作为启动程序)输出到文件

答案 1 :(得分:1)

正如Javs所说,你想要使用模块而不是手工完成。出于您的目的,由于您有混合内容,我建议XML::LibXML。这是一个我测试的例子,你确实可以像你一样混合内容:

use XML::LibXML;

my $doc = XML::LibXML::Document->new();

my $root = $doc->createElement('html');
$doc->setDocumentElement($root);
my $body = $doc->createElement('body');
$root->appendChild($body);

my $link = $doc->createElement('a');
$link->setAttribute('href', 'http://google.com');
$link->appendText('Google');
$body->appendChild($link);

$body->appendText('Inline Text');

print $doc->toString;