Question

请帮忙。

我有两个文件（file1和file2）。我想从file2中提取其文件列在file1中的列。这些是大文件，有数千个列和行。

文件1

Id123B
Id124A
Id125A

file2的

Code  sex  id123B  id127  id125A

所需的输出文件：

code sex id123B  id125A

以下是我尝试的代码，但它失败了。

!/usr/bin/perl
use strict;
use warnings;

open my $IN, "file2" or die $!;

my $header = <$IN>;

my %sampleID = map { /(.*?)\t/; $1 => 1 } <$IN>;

close($IN);

open $IN, "file1" or die $!;
$header = <$IN>;
my @samples = split /\t/, $header;
my @cols = grep { exists $sampleID{$samples[$_]} } 0..$#samples;


while(<$IN>){
    chomp;
    my @line = (split /\t/)[@cols]; 

    print join( "\t", @line ), "\n";
}

Answer 1

使用哈希将列名映射到列号。

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

open my $COLUMNS, '<', shift or die $!;
chomp( my @columns = <$COLUMNS> );

open my $DATA, '<', shift or die $!;
my @header = split /\t/, <$DATA>;
my %column_index;
@column_index{ @header } = 0 .. $#header;

@columns = grep exists $column_index{$_}, @columns;

while (<$DATA>) {
    chomp( my @cells = split /\t/ );
    say join "\t", @cells[ @column_index{ @columns } ];
}

以script.pl file1 file2运行。请注意，您必须在文件中使用确切的列名，即使用以下file1获得更好的结果：

Code
sex
id123B
id124A
id125A

Perl，提取特定列

1 个答案: