Question

我有两个文件......第一个是包含ID的.txt文件......

第二个文件包含第一列中的文本和第二列中的ID。有什么方法可以比较两个ID，找到匹配并返回第一列中的相应文本吗？

AUX    2398432
AUL    245406

因此，当我解析这两个文件时，脚本应与245406匹配并返回相应的文本AUL。

这是我到目前为止所拥有的：

open FH_TF_IDS, "<$ARGV[0]" or die $!; 
while (<FH_TF_IDS>) {
    chomp; 
    @fields=split("\t",$_);
    $hash{$fields[1]}=$fields[0];
} 
close FH_TF_IDS;

open IDS, "<$ARGV[1]" or die $!;
@ids=<IDS>; 
close IDS; 

foreach $id (@ids){ 
    $hash_count{$hash{$id}}++;
} 

foreach $family (sort (keys %hash_count)) {
    print "$family\t$hash_count{$family}\n";
}

Answer 1

user1364517，

我认为你在尝试解决问题方面做得很好。但是，我看到了两个问题。

在chomp @ids;之后添加close IDS;以删除每个数组元素末尾的\ n。
将$hash_count{$hash{$id}}++;更改为$hash_count{$hash{$id}} = $id if $hash{$id};

这些微小的更改将使您的程序正常工作。

这是一个更“'hacky'（当然不那么惯用）的解决方案 - 只是为了它的乐趣：

use strict;
use warnings;

my %hash;

{open my $file, "<$ARGV[0]" or die $!;
$hash{$2} = $1 while <$file> =~ /(.*)\t(.*)/;}

{open my $file, "<$ARGV[1]" or die $!;
map{print "$hash{$_}\t$_\n"}sort{$hash{$a} cmp $hash{$b}}
grep{$hash{$_}}map{s/\n\z//r}<$file>;}

使用块，以便在my $file超出范围时关闭文件。

希望这有帮助！

Answer 2

我知道你是这门语言的初学者。有些东西可以帮助您调试程序。

在脚本的顶部'使用Data :: Dumper;'

一旦你这样做，你就可以放入像这样的陈述打印Dumper（$ hash）并打印Dumper（$ hash_count）这两个陈述应该可以让你看到程序中的错误。

作为旁注，通过perl -d运行这个也是一个选项，如果你要继续使用该语言，你一定要学习。

Answer 3

试试这个......

    #!/usr/bin/perl
    use Data::Dumper;

    open a1, "<$ARGV[0]";
    while(<a1>) {
        my @a = split " ", $_;

        open b1, "<$ARGV[1]";
        while(<b1>) {
            my @b = split "\n", $_;
            my @test = (split " ", $b[0]);
            if($test[1] == $a[0]) {
                print $test[0]."\n";
            }
        }
        close b1;
    }

在终端

中提供以下命令

    perl test.pl a.txt b.txt

Answer 4

几点建议：

在Modern Perl上拿一本书。 Perl是一种古老的语言。自从20世纪80年代首次问世以来，你在Perl中的编程方式已经发生了多年的变化。不幸的是，有太多人从Perl 5.0之前的网站上学习Perl。
在您的计划中使用use strict;和use warnings;。这将捕获大部分编程错误。
不要依赖$_。它是全球性的，可能会导致问题。 for (@people) {看起来很整洁，但最好是for my $person ( @people )。
使用/../中的split和'...'中的join。
将变量用于文件句柄。它们更容易传递给子程序：

这是您的计划：

我用更现代的风格重写了你的程序，它的功能与你的功能非常相似。我做了一些错误检查，但除此之外，它有效：

use strict;
use warnings;
use feature qw(say);  # Nicer that print.
use autodie;          # Will automatically die on open and close errors

if ( @ARGV < 2 ) {
    die qq(Not enough arguments);
}

my $tf_id_file = shift;   # Use variable names and not `@ARGV` directly
my $id_file    = shift;   # Makes your program easier to understand 

open my $tf_ids_fh, "<", $tf_id_file;

my %hash;                # Not a good name for the variable, but that's what you had.
while ( my $line = <$tf_ids_fh> ) {
    chomp $line;         # Always chomp after a read
    my ( $text, $id ) = split /\s+/, $line;  # Use variable names, not @fields
    if ( not defined $id ) {           # Error checking
        die qq(Missing id field in line $. of tf_ids file);
    }
    $hash{$text} = $id;
}
close $tf_ids_fh;

open my $ids_fh, "<", $id_file;
my @ids = <$ids_fh>;
chomp @ids;
close $ids_fh;

my %totals;
for my $id ( @ids ) {
    if ( not exists $totals{ $hash{$id} } ) {   #Initialize hash before adding to it
        $totals{ $hash{$id} } = 0;
    }
    $totals{ $hash{$id} }++;
}

for my $family ( sort keys %totals ) {
    printf "%10.10s %4d\n", $family, $totals{$family};
}

我使用printf格式化您的打印输出比普通print更好。

比较两个文本文件并返回相应的值

4 个答案: