我有一个Perl脚本,可以读取如下所示的简单.csv
文件 -
"header1","header2","header3","header4"
"12","12-JUL-2012","Active","Processed"
"13","11-JUL-2012","In Process","Pending"
"32","10-JUL-2012","Active","Processed"
"24","08-JUL-2012","Active","Processed"
.....
目的是将此.csv
转换为.xml
文件,如下所示 -
<ORDERS>
<LIST_G_ROWS>
<G_ROWS>
<header1>12</header1>
<header2>12-JUL-2012</header2>
<header3>Active</header3>
<header4>Processed</header4>
</G_ROWS>
<G_ROWS>
<header1>13</header1>
<header2>11-JUL-2012</header2>
<header3>In Process</header3>
<header4>Pending</header4>
</G_ROWS>
....
....
</LIST_G_ROWS>
</ORDERS>
我知道CPAN中有XML::CSV
可以让我的生活更轻松,但我想利用已安装的XML::LibXML
来创建XML,而不是安装XML::CSV
。我能够读取CSV并如上所述创建XML文件而没有任何问题,但我得到了XML中元素的随机顺序,如下所示。我需要让元素(子节点)的顺序与.csv
文件同步,如上所示,但我不确定如何解决这个问题。我正在使用hash
和sort()
哈希并没有完全解决问题。
<ORDERS>
<LIST_G_ROWS>
<G_ROWS>
<header3>Active</header3>
<header1>12</header1>
<header4>Processed</header4>
<header2>12-JUL-2012</header2>
</G_ROWS>
......
等等。以下是我的perl代码的片段
use XML::LibXML;
use strict;
my $outcsv="/path/to/data.csv";
my $$xmlFile="/path/to/data.xml";
my $headers = 0;
my $doc = XML::LibXML::Document->new('1.0', 'UTF-8');
my $root = $doc->createElement("ORDERS");
my $list = $doc->createElement("LIST_G_ROWS");
$root->appendChild($list);
open(IN,"$outcsv") || die "can not open $outcsv: $!\n";
while(<IN>){
chomp($_);
if ($headers == 0)
{
$_ =~ s/^\"//g; #remove starting (")
$_ =~ s/\"$//g; #remove trailing (")
@keys = split(/\",\"/,$_); #split per ","
s{^\s+|\s+$}{}g foreach @keys; #remove leading and trailing spaces from each field
$headers = 1;
}
else{
$_ =~ s/^\"//g; #remove starting (")
$_ =~ s/\"$//g; #remove trailing (")
@vals = split(/\",\"/,$_); #split per ","
s{^\s+|\s+$}{}g foreach @vals; #remove leading and trailing spaces from each field
my %tags = map {$keys[$_] => $vals[$_]} (0..@keys-1);
my $row = $doc->createElement("G_ROWS");
$list->appendChild($row);
for my $name (keys %tags) {
my $tag = $doc->createElement($name);
my $value = $tags{$name};
$tag->appendTextNode($value);
$row->appendChild($tag);
}
}
}
close(IN);
$doc->setDocumentElement($root);
open(OUT,">$xmlFile") || die "can not open $xmlFile: $!\n";
print OUT $doc->toString();
close(OUT);
答案 0 :(得分:1)
你可以完全忘记%tags
哈希。相反,循环遍历@keys
:
for my $i (0 .. @keys - 1) {
my $key = $keys[$i];
my $value = $values[$i];
my $tag = $doc->createElement($key);
$tag->appendTextNode($value);
$row->appendChild($tag);
}
这样,您的密钥的排序就会被保留。使用哈希时,排序是不确定的。
答案 1 :(得分:1)
您的计划涉及的程度远远超过其需要。为方便和可靠,您应使用Text::CSV
来解析CSV文件。
以下程序可以满足您的需求。
use strict;
use warnings;
use Text::CSV;
use XML::LibXML;
open my $csv_fh, '<', '/path/to/data.csv' or die $!;
my $csv = Text::CSV->new;
my $headers = $csv->getline($csv_fh);
my $doc = XML::LibXML::Document->new('1.0', 'UTF-8');
my $orders = $doc->createElement('ORDERS');
$doc->setDocumentElement($orders);
my $list = $orders->appendChild($doc->createElement('LIST_G_ROWS'));
while ( my $data = $csv->getline($csv_fh) ) {
my $rows = $list->appendChild($doc->createElement('G_ROWS'));
for my $i (0 .. $#$data) {
$rows->appendTextChild($headers->[$i], $data->[$i]);
}
}
print $doc->toFile('/path/to/data.xml', 1);
<强>输出强>
<?xml version="1.0" encoding="UTF-8"?>
<ORDERS>
<LIST_G_ROWS>
<G_ROWS>
<header1>12</header1>
<header2>12-JUL-2012</header2>
<header3>Active</header3>
<header4>Processed</header4>
</G_ROWS>
<G_ROWS>
<header1>13</header1>
<header2>11-JUL-2012</header2>
<header3>In Process</header3>
<header4>Pending</header4>
</G_ROWS>
<G_ROWS>
<header1>32</header1>
<header2>10-JUL-2012</header2>
<header3>Active</header3>
<header4>Processed</header4>
</G_ROWS>
<G_ROWS>
<header1>24</header1>
<header2>08-JUL-2012</header2>
<header3>Active</header3>
<header4>Processed</header4>
</G_ROWS>
</LIST_G_ROWS>
</ORDERS>
<强>更新强>
如果没有Text::CSV
提供的奇特选项,如果其选项已修复,其功能相当简单。此备选方案提供了子例程csv_readline
来替换Text::CSV
方法readline
。它的工作方式与模块大致相同。
该程序的输出与上述相同。
use strict;
use warnings;
use XML::LibXML;
open my $csv_fh, '<', '/path/to/data.csv' or die $!;
my $doc = XML::LibXML::Document->new('1.0', 'UTF-8');
my $orders = $doc->createElement('ORDERS');
$doc->setDocumentElement($orders);
my $list = $orders->appendChild($doc->createElement('LIST_G_ROWS'));
my $headers = csv_getline($csv_fh);
while ( my $data = csv_getline($csv_fh) ) {
my $rows = $list->appendChild($doc->createElement('G_ROWS'));
for my $i (0 .. $#$data) {
$rows->appendTextChild($headers->[$i], $data->[$i]);
}
}
print $doc->toFile('/path/to/data.xml', 1);
sub csv_getline {
my $fh = shift;
defined (my $line = <$fh>) or return;
$line =~ s/\s*\z/,/;
[ map { /"(.*)"/ ? $1 : $_ } $line =~ /( " [^"]* " | [^,]* ) , /gx ];
}
答案 2 :(得分:-2)
似乎XML::LibXml
有点过分,只需使用XML::Simple
并构建描述该XML结构的正确哈希,而不是将XMLOut
转储到XML文件< / p>