Question

我有文件（node_list.txt），其中包含节点列表。

nod_1
nod_2
nod_3
nod_4
nod_5

我有一个主机ip地址列表（此计数可能有所不同），需要将node_list划分为相同数量的部分，然后将这些分割的节点文件发送给每个主机。 host_ip1 host_ip2 host_ip3

文件中节点的划分基于host_ip的可用数量。

在我的示例中，我应该得到：

node_list_file_1.txt
nod_1
nod_2

node_list_file_2.txt
nod_3
nod_4

node_list_file_3.txt
nod_5

我的代码如下：

print Dumper(\@list_of_hosts);

my $node_file = "node_list.txt";
open(NODE_FILE, "< $node_file") or die "can't open $node_file: $!";
my $count;
$count += tr/\n/\n/ while sysread(NODE_FILE, $_, 2 ** 16);
print "COUNT:$count\n";

my $res = $count / scalar @list_of_ips;

在$res中，我计算出每个文件应有多少行。但是如何将其记录到文件中。

Answer 1

my $num_buckets = 3;

my @lines = <>;

my $per_bucket = int( @lines / $num_buckets );
my $num_extras =      @lines % $num_buckets;

for my $bucket_num (0..$num_buckets-1) {
   my $num_lines = $per_bucket;
   if ($num_extras) {
      ++$num_lines;
      --$num_extras;
   }

   my $qfn = "node_list_file_${bucket_num}.txt";
   open(my $fh, '>', $qfn)
      or die("Can't create \"$qfn\": $!\n");

   $fh->print(splice(@lines, 0, $num_lines));
}

$per_bucket是每个文件的节点数。
$num_extras是具有一个额外节点的文件数量。

请注意，$num_lines的计算可以简化为以下内容（出于可读性考虑，我将其避免）：

my $num_lines = $per_bucket + ( $num_extras-- > 0 );

以上内容将整个文件加载到内存中。以下是不提供的替代解决方案：

my $num_buckets = 3;

my @fhs;
for my $bucket_num (1..$num_buckets) {
   my $qfn = "node_list_file_${bucket_num}.txt";
   open(my $fh, '>', $qfn)
      or die("Can't create \"$qfn\": $!\n");

   push @fhs, $fh;
}

$fhs[ ( $. - 1 ) % @fhs ]->print($_) while <>;

但是，当它执行所请求的任务时，其输出与指定的不完全相同：

node_list_file_1.txt
--------------------
nod_1
nod_4

node_list_file_2.txt
--------------------
nod_2
nod_5

node_list_file_3.txt
--------------------
nod_3

Answer 2

这将拆分行，以便除最后一个文件外的每个文件都获得最大的相等数，从而最后一个文件获取剩余的文件数。因此，如果用10行来分割3个文件，它们将变为4-4-2。^†

use warnings;
use strict;
use feature 'say';
use autodie qw(open);

my @lines = <>;
my $num_files = 3;
my $lines_per_file = int @lines/$num_files;
$lines_per_file += 1  if @lines % $num_files;

my @chunks;
push @chunks, [ splice @lines, 0, $lines_per_file ] while @lines;

my @fhs_out = map { open my $fh, ">fout_$_.txt"; $fh } 1..$num_files;

for my $i (0..$#chunks) { 
    print {$fhs_out[$i]} $_ for @{$chunks[$i]};
};

注释

<>从命令行提交的文件中读取所有行
如果要写入的文件数量不能平均分配要在它们之间分割的行数，则我们需要在每个文件中再增加一行（最后一个接收剩余的行数）
带有行的数组是连续splice-ed的，以便生成大行的行，每个行将进入一个文件，因此最终将其清空。
我打开所有需要的输出文件，并将文件句柄存储到数组中，以便以后方便地将几行行写入其文件中。这绝对没有必要，因为可以遍历@chunks并打开一个文件，然后为每行（组）行写入文件
在写入需要从表达式求值的文件句柄时，如果仅是基本标量，则必须将其复杂化，例如{ $fhs_out[$i] }。来自print

如果将句柄存储在数组或哈希中，或者通常在任何时候使用比裸字句柄更复杂的表达式或普通的，未下标的标量变量来检索它，则必须使用返回的块相反，文件句柄值[...]

有关其他方法和更多讨论，请参见this post。

^†如果在这种情况下，行的分布必须为4-3-3，因此请尽可能均匀地拆分，则需要像上面的代码一样修改

my $lines_per_file = int @lines/$num_files;
my $extra = @lines % $num_files;

my @chunks;
push @chunks,
     [ splice @lines, 0, $lines_per_file + ( $extra-- > 0 ? 1 : 0 ) ] 
         while @lines;

其余的都一样。

Answer 3

以下代码可能符合您的要求

use strict;
use warnings;

use feature 'say';

use Data::Dumper;

my $debug = 1;                          # $debug = 1 -- debug mode

my $node_file = "node_list.txt";        # input filename

my @hosts = qw(host_ip1 host_ip2 host_ip3); # Hosts to distribute between

my $num_hosts = @hosts;                 # Number of hosts to distribute between

open(my $fh, "<", $node_file) 
        or die "can't open $node_file: $!";

my @nodes =  <$fh>;                     # read input lines into @nodes array

chomp @nodes;                           # trim newline from each element @nodes array

close $fh;

print Dumper(\@nodes) if $debug;        # print @nodes content in debug mode

my $count = @nodes;                     # count number nodes in @nodes array

print "COUNT: $count lines in the input file\n";

# How many lines store in out files
my $lines_in_file = int($count/$num_hosts + 0.5);

my $lines_out   = $lines_in_file;       # how many line to output per file
my $file_index  = 1;                    # index for output filenames
my $filename    = "node_list_file_${file_index}.txt";

# open OUT file
open(my $out, ">", $filename)
        or die "Couldn't open $filename";

foreach my $node_name (@nodes) {        # process each element of @nodes array
    say $out $node_name;                # store node in OUT file

    $lines_out--;                       # decrease number of left lines for output

    if( $lines_out == 0 ) {             # all lines per file stored
        close $out;                     # close file

        $lines_out = $lines_in_file;    # reinitialize number of lines for output

        $file_index++;                  # increase index for filename
        $filename = "node_list_file_${file_index}.txt";

        open($out, ">", $filename)      # open new OUT file
            or die "Couldn't open $filename";
    }
}

close $out;                             # close OUT file

根据计数等分文件

3 个答案: