匹配并替换列表中相同列中的不同值

时间:2014-02-21 17:24:45

标签: regex perl unix awk

我想将第二列中的“0”数字替换为另一个文件的第二列。例如,输入1,chr1,第二列是“0”,我想用输入2文件的第二列中的“754192”替换它,我想用其他“0”数字来替换它因此,对于第二列输入1中的chr2而不是“0”,它将从输入2文件读取“83616”。两个输入文件都是分隔符。我非常感谢任何perl / awk建议。谢谢。

输入1

chr1    0       121347754       0.004130250308662653
chr1    144009053       249250621       0.12551644444465637
chr2    0       90278124        -0.010306187905371189
chr2    95387134        243199373       -0.011985263787209988
chr3    0       91000000        -0.009726814925670624
chr3    93541117        198022430       -0.014836171641945839
chr4    0       49064792        -0.01315629668533802
chr4    52700771        141568601       0.014452865347266197
chr4    141568601       143871023       0.20834201574325562
chr5    0       46113638        -0.013212060555815697
chr5    49560859        68740653        0.004888067487627268
chr5    70744658        180915260       -0.011330894194543362

输入2

chr1    754192
chr2    83616
chr3    108226
chr4    90883
chr5    40975
chr6    209980
chr7    67820
chr8    193585
chr9    206255
chr10   126070

输出

chr1    754192       121347754       0.004130250308662653
chr1    144009053       249250621       0.12551644444465637
chr2    83616       90278124        -0.010306187905371189
chr2    95387134        243199373       -0.011985263787209988
chr3    108226       91000000        -0.009726814925670624
chr3    93541117        198022430       -0.014836171641945839
chr4    90883       49064792        -0.01315629668533802
chr4    52700771        141568601       0.014452865347266197
chr4    141568601       143871023       0.20834201574325562
chr5    40975       46113638        -0.013212060555815697
chr5    49560859        68740653        0.004888067487627268
chr5    70744658        180915260       -0.011330894194543362

5 个答案:

答案 0 :(得分:2)

perl -MFile::Slurp -lape'
  BEGIN { %h = map split, read_file(pop); }
  $F[1] ||= $h{$F[0]};
  $_ = join "\t", @F;
' input1 input2

输出

chr1   754192      121347754       0.004130250308662653
chr1    144009053       249250621       0.12551644444465637
chr2   83616      90278124        -0.010306187905371189
chr2    95387134        243199373       -0.011985263787209988
chr3   108226      91000000        -0.009726814925670624
chr3    93541117        198022430       -0.014836171641945839
chr4   90883      49064792        -0.01315629668533802
chr4    52700771        141568601       0.014452865347266197
chr4    141568601       143871023       0.20834201574325562
chr5   40975      46113638        -0.013212060555815697
chr5    49560859        68740653        0.004888067487627268
chr5    70744658        180915260       -0.011330894194543362

答案 1 :(得分:2)

您可以尝试此awk

awk  'NR==FNR{ a[$1]=$2; next;} $2==0{ $2=a[$1]; }1' OFS="\t" input2 input1 

答案 2 :(得分:2)

这是Perl中的一种方法。程序期望将两个文件的路径作为命令行上的参数。

use strict;
use warnings;

my ($file1, $file2) = @ARGV;
my $fh;

open $fh, '<', $file2 or die qq{Unable to open "$file2" for input: $!};
my %defaults = map {(split)[0,1]} <$fh>;

open $fh, '<', $file1 or die qq{Unable to open "$file1" for input: $!};

while (<$fh>) {
  my @fields = split;
  $fields[1] ||= $defaults{$fields[0]};
  print join("\t", @fields), "\n";
}

<强>输出

chr1  754192  121347754 0.004130250308662653
chr1  144009053 249250621 0.12551644444465637
chr2  83616 90278124  -0.010306187905371189
chr2  95387134  243199373 -0.011985263787209988
chr3  108226  91000000  -0.009726814925670624
chr3  93541117  198022430 -0.014836171641945839
chr4  90883 49064792  -0.01315629668533802
chr4  52700771  141568601 0.014452865347266197
chr4  141568601 143871023 0.20834201574325562
chr5  40975 46113638  -0.013212060555815697
chr5  49560859  68740653  0.004888067487627268
chr5  70744658  180915260 -0.011330894194543362

答案 3 :(得分:1)

稍微程序化的版本(没有错误检查)。

use Modern::Perl;
use autodie;

# read input2 into map 
my %input2 = do { 
  open my $input2, '<', "input2";
  local $/ = undef;
  split( ' ', <$input2> );
};

open my $input1, '<', "input1";
while ( <$input1> ) {
  my ($id) = split( ' ' );
  if ( /^\w+\s+0\s/ ) {
    my $replace_with = $input2{$id};
    s/^(\w+\s+)0(\s)/$1$replace_with$2/;
  }

  print;
}

答案 4 :(得分:1)

Perl中的一个班轮:

$ perl -MFile::Slurp -lape 'BEGIN {$" = "\t"; %input = map { m/([^\s]+)\s*([^\s]+)/ } read_file("input_2")} $F[1] = $input{$F[0]} unless $F[1]' input_1