输入文件名:AB_12_2_1324_01.xml & AB_12_2_1324_02.xml
从输入文件我需要拾取数据,如:
<Filename="AB_12_2_1324_01">
<Filename="AB_12_2_1324_02">
名为AB_12_2_1114.xml
的输出文件此处输入的数据必须附加在输出中,并且输出还有一个新的标签,即
<ID>1<\/ID>
后跟其他数据。
我被困的问题是ID标签没有增加,下一个文件中的数据没有被复制,而是以不需要的空间打印到新行。
CODE:
foreach my $f (@xml) {
#print F7 $f."\n";
open( FH, "$path1\/$f" );
my $data = join( "", <FH> );
if ( $data
=~ m/<Document id="(((\w+)_(\d+)\_(\d+))\_(\d+)\_(\d+))">/s )
{
my $name = $2;
unless ( open FF, '>>' . "$path1\/$file" ) { } #output file
print FF "<ID>\d<\/ID>\n";
print FF "<Name>(.*?)<\/Name>\n"
; #tag taken for example from where data is getting copied.
}
}
输出:
<ID>1<\/ID>
A
B
<ID>1<\/ID>
A
B
输出应为:
<ID>1</ID>
AB
<ID>2</ID>
CD
更新的代码(我使用的代码):
是的,源文件是xml。 如果您不介意,有人可以帮我解释一下这段代码以及解释。
#!/usr/bin/perl
use strict;
use Cwd;
use File::Basename;
use File::Copy;
use File::Find;
use Time::Piece;
my $path1=$ARGV[0];
print "Enter the weekday:"; my $week=<STDIN>;
print "Enter the date:"; my $dd=<STDIN>;
print "Enter the month in short:"; my $mmss=<STDIN>;
print "Enter the year:"; my $yy=<STDIN>;
print "Enter the time in HH:MM:SS:"; my $tt=<STDIN>;
my $pubdate=$week.",".$dd.$mmss.$yy."\t".$tt."\n"; #<pubDate>Sat, 25 Oct 2014 12:20:00 +0000</pubDate> here after Sat the data gets printed with unwanted space and also in newline.
opendir(INP1, "$path1\/");
my @xml = grep(/(\.xml|xMl|xmL|Xml|XmL|XML|XMl|xML)$/,readdir(INP1));
close INP1;
foreach my $f(@xml)
{
open(FH, "$path1\/$f");
my $data = join("", <FH>);
my $xml_list=$data;
my $outfile;
my $Title;
if($xml_list=~m/<Document id="(((\w+)_(\d+)\_(\d+))\_(\d+)\_(\d+))" Num="\d+">/s) #EF_13_2_0314_01
{
my $outfile=$2;
my $digit=$7;
if($digit=~m/(\d+)/s)
{
$digit=~s/^(0)//sg;
my $dig=$digit+2; #i incremented the digit by 2 when the filename is "AB_12_2_1324_01"
if($xml_list=~m/<Field Num="1" Label="Title">(.*?)<\/Field>/s)
{
my $Title=$1;
my $dates1 = localtime->strftime('%m%Y');
my $file = $outfile."_".$dates1.".xml"; #output filename
unless(open FF, '>>'."$path1\/$file"){}
print FF "<item>\n";
print FF "<title>$Title<\/title>\n";
print FF "<pubDate>$pubdate+0000<\/pubDate>\n"; #here before $pubdate and after unwanted space gets added as well as the data gets printed in new line.
print FF "<wp:post_id>$dig<\/wp:post_id>\n"; #here for first file $dig=3 and so on i.e if filename is AB_12_2_1324_01" post id should be 3 and so on.
print FF "<\/item>\n";
#print FF "<\/channel>\n";print FF "<\/rss>\n"; #this part should get printed in output at the end of the file when the entire xml file is being read and appended (say after file AB_12_2_1324_10.xml").
close FF;
}
}
}
}
答案 0 :(得分:0)
请你试试这个:
my $dir = getcwd();
opendir(DIR, $dir) || die "Couldn't able to read dir: $!";
my @xml = grep(/\.xml$/, readdir(DIR));
closedir(DIR);
my $i = '1';
foreach my $f(@xml)
{
#print F7 $f."\n";
open(FH, "$path1/$f") || die "Couldn't able to open: $!\n";
local $/; $_=<FH>; my $data=$_;
if($data=~m/<Document[^>]*id="(((\w+)_(\d+)\_(\d+))\_(\d+)\_(\d+))">/g) #If the Document id fixed formatted only
{
my $name=$2;
open FF, ">>$path1/$file" unless("$path1/$file"); #output file
print FF "\n<ID>$i<\/ID>\n";
print FF "<Name>$name<\/Name>\n"; #tag taken for example from where data is getting copied.
}
$i++;
}