如何使用regex perl迭代多个文件的循环?

时间:2014-11-26 12:21:10

标签: regex perl

输入文件名:AB_12_2_1324_01.xml & AB_12_2_1324_02.xml 从输入文件我需要拾取数据,如:

<Filename="AB_12_2_1324_01">
<Filename="AB_12_2_1324_02">

名为AB_12_2_1114.xml的输出文件此处输入的数据必须附加在输出中,并且输出还有一个新的标签,即     <ID>1<\/ID>后跟其他数据。

我被困的问题是ID标签没有增加,下一个文件中的数据没有被复制,而是以不需要的空间打印到新行。

CODE:

foreach my $f (@xml) {

    #print F7 $f."\n";
    open( FH, "$path1\/$f" );
    my $data = join( "", <FH> );
    if ( $data
        =~ m/<Document id="(((\w+)_(\d+)\_(\d+))\_(\d+)\_(\d+))">/s )
    {
        my $name = $2;
        unless ( open FF, '>>' . "$path1\/$file" ) { }    #output file
        print FF "<ID>\d<\/ID>\n";
        print FF "<Name>(.*?)<\/Name>\n"
            ;    #tag taken for example from where data is getting copied.
    }
}

输出:

<ID>1<\/ID>

  A
  B
<ID>1<\/ID>

  A
  B

输出应为:

<ID>1</ID>
AB

<ID>2</ID>
CD

更新的代码(我使用的代码):

是的,源文件是xml。 如果您不介意,有人可以帮我解释一下这段代码以及解释。

#!/usr/bin/perl
use strict;
use Cwd;
use File::Basename;
use File::Copy;
use File::Find;
use Time::Piece;

my $path1=$ARGV[0];

print "Enter the weekday:";   my $week=<STDIN>;
print "Enter the date:";     my $dd=<STDIN>;
print "Enter the month in short:";  my $mmss=<STDIN>;
print "Enter the year:";   my $yy=<STDIN>;
print "Enter the time in HH:MM:SS:";  my $tt=<STDIN>;

my $pubdate=$week.",".$dd.$mmss.$yy."\t".$tt."\n";  #<pubDate>Sat, 25 Oct 2014 12:20:00        +0000</pubDate> here after Sat the data gets printed with unwanted space and also in newline.

opendir(INP1, "$path1\/");
my @xml = grep(/(\.xml|xMl|xmL|Xml|XmL|XML|XMl|xML)$/,readdir(INP1));   
close INP1;

foreach my $f(@xml)
{
open(FH, "$path1\/$f");
my $data = join("", <FH>);
my $xml_list=$data;
my $outfile;
my $Title;

if($xml_list=~m/<Document id="(((\w+)_(\d+)\_(\d+))\_(\d+)\_(\d+))" Num="\d+">/s)   #EF_13_2_0314_01
{
my $outfile=$2;
my $digit=$7;
if($digit=~m/(\d+)/s)
{
$digit=~s/^(0)//sg;
my $dig=$digit+2; #i incremented the digit by 2 when the filename is "AB_12_2_1324_01"

if($xml_list=~m/<Field Num="1" Label="Title">(.*?)<\/Field>/s)
{
my $Title=$1;
my $dates1 = localtime->strftime('%m%Y');
my $file = $outfile."_".$dates1.".xml";  #output filename

unless(open FF, '>>'."$path1\/$file"){}
print FF "<item>\n";
print FF "<title>$Title<\/title>\n";
print FF "<pubDate>$pubdate+0000<\/pubDate>\n"; #here before $pubdate and after unwanted space gets added as well as the data gets printed in new line.
print FF "<wp:post_id>$dig<\/wp:post_id>\n"; #here for first file $dig=3 and so on i.e if filename is AB_12_2_1324_01" post id should be 3 and so on.
print FF "<\/item>\n";
#print FF "<\/channel>\n";print FF "<\/rss>\n"; #this part should get printed in output at the end of the file when the entire xml file is being read and appended (say after file AB_12_2_1324_10.xml").
close FF;
}
}
}
}

1 个答案:

答案 0 :(得分:0)

请你试试这个:

 my $dir = getcwd();
 opendir(DIR, $dir) || die "Couldn't able to read dir: $!";
 my @xml = grep(/\.xml$/, readdir(DIR));
 closedir(DIR);
 my $i = '1';
 foreach my $f(@xml)
 {
    #print F7 $f."\n";
    open(FH, "$path1/$f") || die "Couldn't able to open: $!\n";
    local $/; $_=<FH>; my $data=$_;
    if($data=~m/<Document[^>]*id="(((\w+)_(\d+)\_(\d+))\_(\d+)\_(\d+))">/g) #If the Document id fixed formatted only
    {
        my $name=$2;
        open FF, ">>$path1/$file" unless("$path1/$file"); #output file
        print FF "\n<ID>$i<\/ID>\n";
        print FF "<Name>$name<\/Name>\n"; #tag taken for example from where data is getting copied.
    }
    $i++;
 }