如何使用Perl分割文件?

时间:2013-08-27 06:01:31

标签: text-processing perl

每个div应该作为单独的文件分开。

INPUT.TXT

[[div]]
line 1
line 2
...
[[/div]]

[[div]]
line 3
line 4
line 5
...
[[/div]]

[[div]]
line 6
line 7
...
[[/div]]

FILENAME.TXT

fm.html
chap01.html
bm.html

需要输出

fm.html

<html>
<body>
line 1
line 2
...
</body>
</html>

chap01.html

<html>
<body>
line 3
line 4
line 5
...
</body>
</html>

bm.html

<html>
<body>
line 6
line 7
...
</body>
</html>

编码我现在已经尝试了...但是它返回所有文件中的最后一个div ...并且需要添加元...需要解决方案

#!/usr/bin/perl
open(REDA,"filename.txt");
@namef=<REDA>;
open(RED,"input.txt");
open(WRITX,">input1.txt");
while(<RED>)
   {
    chomp($_);
    $_="$_"."<cr>";
    print WRITX $_;
   }
close(RED);
close(WRITX);
open(REDQ,"input1.txt");
open(WRITQ,">input2.txt");
while(<REDQ>)
   {
                $_=~s/\[\[div\]\]<cr>/\n\[\[div\]\]/gi;
    print WRITQ $_;
   }
close(REDQ);
close(WRITQ);
open(REDE,"input2.txt");
while(<REDE>)
   {
   foreach $namef (@namef)
    {
         chomp($namef);
         $namef=~s/\.[a-z]+//gi;
        open(WRIT1,">$namef.html");
            if(/\[\[div\]\]/i)
            {
                chomp($_);
                $_=~s/<cr>/\n/gi;
                print WRIT1 $_;
            }
         }
    }
close(REDA);
close(REDE);
close(REDX);
close(WRIT1);
system ("del input1.txt");
system ("del input2.txt");

4 个答案:

答案 0 :(得分:1)

如果你确定[[div]]部分用空行分隔,你可以使用Perl的段落模式slurp,它将文件分成由一个或多个空行分隔的块。以下代码(已测试)可满足您的需求。在当前目录包含相关文件的终端中执行以下命令:

perl -n00 -e '
    BEGIN{ #Executed before input.txt is read
        open $f,"<","filename.txt";
        @names = split /\n+/,<$f> #Split is needed because we changed the input record separator
    }

    # The following is executed for each "paragraph" (div section)
    s!\[\[div\]\]\n!<html>\n<body>\n!; # substitute <html>\n<body\n instead of [[div]]
    s!\[\[/div\]\]\n!</body>\n</html>!; # substitute </body>\n</html> instead of [[/div]]
    $content{shift @names}=$_; #Add the modified content to hash keyed by file name

    END{ #This is executed after the whole of input.txt has been read
        for(keys %content){ #For each file we want to create
            open $of,">",$_;
            print $of $content{$_}
        }
    }
' input.txt

<强>更新

如果要将上述代码用作Perl脚本,可以执行以下操作:

#!/usr/bin/env perl

use strict;
use warnings;

open my $f,'<','filename.txt' or die "Failed to open filename.txt: $!\n";
my @names;
chomp(@names=<$f>);

open my $if,'<','input.txt' or die "Failed to open input.txt: $!\n";
my %content;
while(my $paragraph=do{local $/="";<$if>}){
    $paragraph=~ s!\[\[div\]\]\n!<html>\n<body>\n!;
    $paragraph=~ s!\[\[/div\]\]\n!</body>\n</html>!;
    $content{shift @names}=$paragraph;
}

for(keys %content){
    open my $of,'>',$_ or die "Failed to open $_ : $!\n";
    print $of $content{$_}
}

将上述内容保存为(例如)split_file.pl,通过chmod +x split_file.pl将其设为可执行文件,然后将其作为./split_file.pl运行。

答案 1 :(得分:1)

你可以这样做:

#!/usr/bin/env perl
use strict;
use warnings;

my @file_names;
## Read the list of file names
open(my $fh,"$ARGV[0]");
while (<$fh>) {
    chomp; #remove new line character from the end of the line
    push @file_names,$_;
}

my $counter=0;
my ($file_name,$fn);
## Read the input file
open($fh,"$ARGV[1]");
while (<$fh>) {
    ## If this is an opening DIV, open the next output file,
    ## and set $counter to 1.
    if (/\[\[div\]\]/) {
    $counter=1;
    $file_name=shift(@file_names);
    open($fn, '>',"$file_name");
    }
    ## If this is a closing DIV, print the line and set $counter back to 0
    if (/\[\[\/div\]\]/) {
    $counter=0;
    print $fn $_;
    close($fn);
    }
    ## Print into the corresponding file handle if $counter is 1
    print $fn $_ if $counter==1
}

将脚本保存为foo.pl并按以下方式运行:

perl foo.pl filename.txt Input.txt 

答案 2 :(得分:0)

在Perl中,您可以循环遍历文件filename.txt的内容,如下所示:

#!/usr/bin/perl

# somescript.pl

open (my $fh, "<", "filename.txt");
my @files = <$fh>;
close ($fh);

foreach my $file (@files) {
    print "$file";
}

将上述内容放在名为somescript.pl的文件中,使其可执行,chmod +x somescript.pl并运行它:

$ ./somescript.pl 
fm.html
chap01.html
bm.html

您可以看到它现在正在文件filename.txt中读取并将每一行打印到屏幕上。我把剩下的留给你试试。如果你卡住了请求帮助。

我会使用与filename.txt文件中读取的相同方法在input.txt文件中读取。

答案 3 :(得分:0)

用更为惯用的Perl编写它,你可能得到这样的东西:

#!/usr/bin/perl

use strict;
use warnings;

# First argument is the name of the file that contains
# the filenames.
open my $fn, shift or die $!;
chomp(my @files = <$fn>);

# Variable to contain the current open filehandle
my $curr_fh;
while (<>) {
  # Skip blank lines
  next unless /\S/;

  # If it's the opening of a div...
  if (/\[\[div]]/) {
    # Open the next file...
    open $curr_fh, '>', shift @files or die $!;
    # Print the opening html...
    print $curr_file "<html>\n<body>\n";
    # ... and skip the rest of the loop
    next;
  }

  # If it's the end of a div
  if (/\[\[\/div]]/) {
    # Print the closing html...
    print $curr_fh "</body>\n</html>\n";
    # Close the current file...
    close $curr_fh;
    # Unset the variable so we can reuse it...
    undef $curr_fh;
    # and skip the rest of the loop
    next;
  }

  # Otherwise, just print the record to the currently open file
  print $curr_fh $_;
}

使用两个参数调用它,包含文件名(filename.txt)的文件名,后跟包含数据的文件名(input.txt)。