匹配一个列表到另一个

时间:2014-02-20 07:24:08

标签: perl unix

我有一个txt文件,其数据看起来像这样(TEST):

chr1_10524
chr1_10525
chr1_10562
chr1_8383722
chr1_201327234
chr2_123123

另一个txt文件,其数据看起来像这样(DATABASE):

chrom chromStart chromEnd name
chr1 67071812 67170812 13_Heterochrom/lo
chr1 201326377 201330777 13_Heterochrom/lo
chr1 8383613 8389213 12_Repressed
chr2 120000 130000 1_Active Promoter

我希望获得一个输出文件,其中TEST与DATABASE匹配,给出如下内容:

chr1_8383722 12_Repressed
chr1_201327234 13_Heterochrom/lo
chr2_123123 1_Active Promoter

这可以在perl上完成吗?谢谢!

2 个答案:

答案 0 :(得分:1)

试试这个:

#!/usr/bin/perl

use warnings;
use strict;

open(my $db, "<", "database.txt") or die "Cannot open < database.txt: $!";
open(my $tst, "<", "test.txt") or die "Cannot open < test.txt: $!";

my @database;

while (<$db>) {
    chomp;
    my @fields = split;
    push @database, \@fields;
}

while (my $line = <$tst>) {
    chomp($line);
    my ($chr, $pos) = split /_/, $line;
    # There is not unique key can be used to detect whether an entry is exist
    # in the database.
    foreach my $entry (@database) {
        if ($chr eq $entry->[0] && $entry->[1] <= $pos && $pos <= $entry->[2]) {
            print "$line $entry->[3]\n";
        }
    }
}

答案 1 :(得分:0)

也许以下内容会有所帮助:

use strict;
use warnings;

my %hash;
local $" = '_';

while (<>) {
    chomp;
    $hash{$_} = undef;
    last if eof;
}

while (<>) {
    my @cols = split;
    print "@cols[ 0, 1 ] $cols[-1]\n" if exists $hash{"@cols[ 0, 1 ]"};
}

命令行用法:perl script.pl TEST DATABSE [>outFile]

最后一个可选参数(不带方括号)将输出定向到文件。