Question

我有一个名为search.txt的文件，其中包含多个搜索模式

示例“search.txt”（总共超过300个条目）：

A28
A32
A3C
A46
A50
A5A
898
8A2
8AC
8B6
8C0

我要搜索的文件夹中的示例文件（总共超过5000）：

 1_0_1_4AB_3_56_300000_0_0_0.png
 1_0_1_5A0_20_56_300000_0_0_0.png
 1_0_1_A28_22_56_300000_0_0_0.png
 1_0_1_A32_22_56_300000_0_0_0.png
 1_0_1_A96_23_56_300000_0_0_0.png
 1_0_1_898_21_56_300000_0_0_0.png

我需要针对search.txt中的所有条目检查所有.png的第四个字符串（字符串由“_”分隔）我之前使用过类似的perl脚本：

match4th.pl

#!/usr/bin/perl -w
use strict;
my $pat = qr/$ARGV[0]/;
while (<STDIN>) {
    my (undef, undef, undef, $fourth) = split /_/;
    print if defined($fourth) && $fourth =~ $pat;
}

然后我会使用类似的东西来执行sccript并将匹配文件移动到新位置：

cd /png_folder
find . -name '*.png' | perl match4th.pl '/tmp/search.txt' | xargs mv -t /tmp/results

我不确定的部分是如何告诉find命令使用/tmp/search.txt中的所有条目而不是将每个模式写入find命令我也更喜欢复制文件而不是移动它们

Answer 1

您可以直接将search.txt文件用作grep的模式列表：

find . -name '*.png' | grep -f search.txt | xargs ...

或者如果你想让模式更严格，你可以这样做：

find . -name '*.png' | grep -f <(sed s/^/[0-9]_[0-9]_[0-9]_/ search.txt)

甚至更严格：

find . -name '*.png' | grep -f <(sed s?^?/[0-9]_[0-9]_[0-9]_? search.txt)

更严格的是：

find . -name '*.png' | grep -f <(sed 's?.*?/[0-9]_[0-9]_[0-9]_&_?' search.txt)

在最后一行中，search.txt中的整行匹配（.*），在替换中我们前缀为模式/[0-9]_[0-9]_[0-9]_，后跟匹配的字符串（{{ 1}}），然后是&。例如，如果您在_中使用字母A作为模式，则会生成该行的模式search.txt，这将使您的文件与/[0-9]_[0-9]_[0-9]_A_正确匹配那里。

如果输出看起来不错，您可以将其传送到_A_以复制匹配的文件，如下所示：

xargs

Answer 2

最有效的解决方案应该是：

use strict;
use warnings;
use File::Basename; # no_chdir will cause we will get full path name
use File::Find;
use File::Copy;     # copy and move will work as shell's cp and mv

my ( $fn, $dir, $target ) = @ARGV; # script arguments

# check parameters
( stat($dir)    && -d _ ) or die "Not a dir $dir";
( stat($target) && -d _ ) or die "Not a dir $target";

# construct regexp for matching files
# use quotemeta to sanitize data read from $fn file 
my $re = join '|', map quotemeta, do {
    # open file
    open( my $fh, '<', $fn ) or die "$fn: $!";
    my @p = <$fh>;            # read all patterns
    close($fh);
    chomp @p;                 # remove end of line from patterns
    @p;                       # return of do statement
};
$re = qr/$re/;                # precompile regexp
# it makes trie for up to ten thousand patterns so match should be O(1)

sub wanted {
    my $fourth;
    lstat($_)                 # initialize special _ term
        && (
           -d _               # is directory? Return true so step in depth
        || -f _               # otherwise if is file
        && /\.png$/           # is filename in $_ ending .png
        # split by '_' to five pieces max and get fourth part (index 3) 
        && defined( $fourth = ( split '_', basename($_), 5 )[3] ) # check if defined 
        && $fourth =~ /^$re$/ # match regexp
        && do { move( $_, $target ) or die "$_: $!" } # then move using File::Copy::move
        );                    # change move to copy if you want copy file instead
}

# do not change directory so $target can be relative and move will still work well
find( { wanted => \&wanted, no_chdir => 1 }, $dir );

用法

perl find_and_move.pl /tmp/search.txt . /tmp/results

Answer 3

您使用的是my $pat = qr/$ARGV[0]/;，但$ARGV[0]是/tmp/search.txt。您需要实际读取该文件。

#!/usr/bin/perl -w
use strict;

my $re = do {
   my $qfn = shift(@ARGV);
   open(my $fh, '<', $qfn) or die $!;
   chomp( my @pats = <$fh> );
   my $pat = join '|', map quotemeta, @pats;
   qr/^$pat\z/
};

while (<>) {
    my $tag = (split /_/)[3];
    next if !defined($tag);
    print if /$re/;
}

使用多个搜索模式列表在多个.png文件名中搜索，并将结果复制到新文件夹

3 个答案: