如何找到出现在制表符分隔文件的两列中的元素?

时间:2012-02-28 06:54:25

标签: perl

我有一个以制表符分隔的文件,它有两列,A和B.

我想计算B中元素重复的次数。我可以在Excel中完成它,但由于这两列包含超过200k个元素,因此它会挂起。

我试过这段代码,但它本身就是元素:

    my %counts = ();
    for (@A) {
    $count{$_}++;
    }

    foreach my $k(keys %counts) {
     print "$k\t$count{$k}\n";
    }

1 个答案:

答案 0 :(得分:2)

试试这个解决方案:

use strict;
use warnings;

my %countx;
my @y;

my $file = 'ab.txt';
open my $fh, '<', $file or die "Couldn't open $file";
while (my $line = <$fh>) {
    chomp $line; # remove newline

    # I've avoided using $a and $b because they are special variables in perl
    my ( $x, $y ) = split /\t/, $line;

    $countx{ $x }++;
    push @y, $y;
}
close $fh;

foreach my $y (@y) {
    my $count = $countx{ $y } || 0;
    print "$y\t$count\n";
}