我有一个巨大的文件路径列表,这些路径对于我们的SCM来说太大了。我需要根据最低的公共级别文件夹将它们缩小。例如,给定以下路径:
//folder1/folder2/folder2
//folder1/folder2/folder5
//folder1/folder3/folder6
//folderx/foldery/folder9
//folderx/foldery/folder10
基于此,我想到达:
//folder1/folder2
//folder1/folder3
//folderx/foldery
文件夹列表将从文本文件中读取,大约2M行。
非常感谢任何帮助。
答案 0 :(得分:1)
这看起来很适合split()
和哈希:
use strict;
use warnings;
my %seen;
foreach my $path ( @paths ) {
$path =~ s|^//||; # Strip off leading //
my @elems = split( '/', $path );
$seen{$elems[0]}{$elems[1]}++;
}
foreach my $rootpath ( sort keys %seen ) {
foreach my $secondpath ( sort keys %{$seen{$rootpath}} ) {
print "//" . $rootpath . "/" . $secondpath . "\n";
}
}
如果您只想打印两次或两次以上的路径,请在next if $seen{$rootpath}{$secondpath} > 1;
之前插入print()
。
我没有对此进行测试,因此可能存在语法错误,但代码给出了一般要点。
答案 1 :(得分:0)
怎么样:
#!/usr/local/bin/perl
use strict;
use warnings;
use 5.010;
my %out;
while(<DATA>) {
chomp;
m#^(//[^/]+/[^/]+)#;
$out{$1} = 1;
}
say for keys%out;
__DATA__
//folder1/folder2/folder2
//folder1/folder2/folder5
//folder1/folder3/folder6
//folderx/foldery/folder9
//folderx/foldery/folder10
<强>输出:强>
//folderx/foldery
//folder1/folder3
//folder1/folder2