Question

我希望Perl脚本从文本文件中提取数据并将其另存为另一个文本文件。文本文件的每一行都包含一个jpg的URL，如“http://pics1.riyaj.com/thumbs/000/082/104//small.jpg”。我希望脚本将每个jpg URL的最后6个数字（即082104）提取到变量。我希望将变量添加到新文本的每一行的不同位置。

输入文字：

text http://pics1.riyaj.com/thumbs/000/082/104/small.jpg text
text http://pics1.riyaj.com/thumbs/000/569/315/small.jpg text

输出文字：

text php?id=82104 text
text php?id=569315 text

由于

Answer 1

到目前为止你尝试了什么？

这是一个简短的程序，可以解决问题，您可以添加其余部分：

while(  )
    {
    s|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|;
    print;
    }

这非常接近命令行程序，它使用-p开关为您处理循环和打印（有关详细信息，请参阅perlrun文档）：

perl -pi.old -e 's|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|' inputfile > outputfile

Answer 2

我不知道是否根据你描述的内容（“最后6位数字”）回答，或者只是假设它符合你所展示的模式。所以我决定回答这两种方式。

这是一种可以处理比您的示例更多样化的行的方法。

use FileHandle;

my $jpeg_RE = qr{
    (.*?)           # Anything, watching out for patterns ahead
    \s+             # At least one space
    (?> http:// )   # Once we match "http://" we're onto the next section
    \S*?            # Any non-space, watching out for what follows
    ( (?: \d+ / )*  # At least one digit, followed by a slash, any number of times
      \d+           # another group of digits
    )               # end group
    \D*?            # Any number of non-digits looking ahead
    \.jpg           # literal string '.jpg'
    \s+             # At least one space
   (.*)             # The rest of the line
}x;

my $infile  = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );

while ( my $line = <$infile> ) { 
    my ( $pre_text, $digits, $post_text ) = ( $line =~ m/$jpeg_RE/ );
    $digits        =~ s/\D//g;
    $outfile->printf( "$pre_text php?id=%s $post_text\n", substr( $digits, -6 ));
}
$infile->close();

但是，如果它和你展示的一样规则，那就容易多了：

use FileHandle;
my $jpeg_RE = qr{
    (?> \Qhttp://pics1.riyaj.com/thumbs/\E ) 
    \d{3}
    /
    ( \d{3} )
    / 
    ( \d{3} )
    \S*?
    \.jpg
}x;

my $infile  = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );

while ( my $line = <$infile> ) { 
    $line =~ s/$jpeg_RE/php?id=$1$2/g;
    $outfile->print( $line );
}
$infile->close();

如何从文本文件中提取数字数据？

2 个答案: