我有一个名为search.txt的文件,其中包含多个搜索模式
示例“search.txt”(总共超过300个条目):
A28
A32
A3C
A46
A50
A5A
898
8A2
8AC
8B6
8C0
我要搜索的文件夹中的示例文件(总共超过5000):
1_0_1_4AB_3_56_300000_0_0_0.png
1_0_1_5A0_20_56_300000_0_0_0.png
1_0_1_A28_22_56_300000_0_0_0.png
1_0_1_A32_22_56_300000_0_0_0.png
1_0_1_A96_23_56_300000_0_0_0.png
1_0_1_898_21_56_300000_0_0_0.png
我需要针对search.txt中的所有条目检查所有.png的第四个字符串(字符串由“_”分隔) 我之前使用过类似的perl脚本:
match4th.pl
#!/usr/bin/perl -w
use strict;
my $pat = qr/$ARGV[0]/;
while (<STDIN>) {
my (undef, undef, undef, $fourth) = split /_/;
print if defined($fourth) && $fourth =~ $pat;
}
然后我会使用类似的东西来执行sccript并将匹配文件移动到新位置:
cd /png_folder
find . -name '*.png' | perl match4th.pl '/tmp/search.txt' | xargs mv -t /tmp/results
我不确定的部分是如何告诉find命令使用/tmp/search.txt中的所有条目而不是将每个模式写入find命令 我也更喜欢复制文件而不是移动它们
答案 0 :(得分:2)
您可以直接将search.txt
文件用作grep
的模式列表:
find . -name '*.png' | grep -f search.txt | xargs ...
或者如果你想让模式更严格,你可以这样做:
find . -name '*.png' | grep -f <(sed s/^/[0-9]_[0-9]_[0-9]_/ search.txt)
甚至更严格:
find . -name '*.png' | grep -f <(sed s?^?/[0-9]_[0-9]_[0-9]_? search.txt)
更严格的是:
find . -name '*.png' | grep -f <(sed 's?.*?/[0-9]_[0-9]_[0-9]_&_?' search.txt)
在最后一行中,search.txt
中的整行匹配(.*
),在替换中我们前缀为模式/[0-9]_[0-9]_[0-9]_
,后跟匹配的字符串({{ 1}}),然后是&
。例如,如果您在_
中使用字母A
作为模式,则会生成该行的模式search.txt
,这将使您的文件与/[0-9]_[0-9]_[0-9]_A_
正确匹配那里。
如果输出看起来不错,您可以将其传送到_A_
以复制匹配的文件,如下所示:
xargs
答案 1 :(得分:1)
最有效的解决方案应该是:
use strict;
use warnings;
use File::Basename; # no_chdir will cause we will get full path name
use File::Find;
use File::Copy; # copy and move will work as shell's cp and mv
my ( $fn, $dir, $target ) = @ARGV; # script arguments
# check parameters
( stat($dir) && -d _ ) or die "Not a dir $dir";
( stat($target) && -d _ ) or die "Not a dir $target";
# construct regexp for matching files
# use quotemeta to sanitize data read from $fn file
my $re = join '|', map quotemeta, do {
# open file
open( my $fh, '<', $fn ) or die "$fn: $!";
my @p = <$fh>; # read all patterns
close($fh);
chomp @p; # remove end of line from patterns
@p; # return of do statement
};
$re = qr/$re/; # precompile regexp
# it makes trie for up to ten thousand patterns so match should be O(1)
sub wanted {
my $fourth;
lstat($_) # initialize special _ term
&& (
-d _ # is directory? Return true so step in depth
|| -f _ # otherwise if is file
&& /\.png$/ # is filename in $_ ending .png
# split by '_' to five pieces max and get fourth part (index 3)
&& defined( $fourth = ( split '_', basename($_), 5 )[3] ) # check if defined
&& $fourth =~ /^$re$/ # match regexp
&& do { move( $_, $target ) or die "$_: $!" } # then move using File::Copy::move
); # change move to copy if you want copy file instead
}
# do not change directory so $target can be relative and move will still work well
find( { wanted => \&wanted, no_chdir => 1 }, $dir );
用法
perl find_and_move.pl /tmp/search.txt . /tmp/results
答案 2 :(得分:0)
您使用的是my $pat = qr/$ARGV[0]/;
,但$ARGV[0]
是/tmp/search.txt
。您需要实际读取该文件。
#!/usr/bin/perl -w
use strict;
my $re = do {
my $qfn = shift(@ARGV);
open(my $fh, '<', $qfn) or die $!;
chomp( my @pats = <$fh> );
my $pat = join '|', map quotemeta, @pats;
qr/^$pat\z/
};
while (<>) {
my $tag = (split /_/)[3];
next if !defined($tag);
print if /$re/;
}