如何在保持不匹配值时合并两个文件?

时间:2016-02-26 19:41:27

标签: linux bash

我正在寻找一种基于前两行合并某些文件的方法。但是我不想排除uniq值,我希望它们能够保留,以便我可以制作最终的独特文件。

我尝试使用此命令:

join <(sort 1) <(sort 2) | tac | awk '{print $1,$3,$4,$5,$2}' | column -t

但它只合并并输出匹配的东西。

例如,文件1的这个数据:

mm1 36307733 36324029 1.45947622984395
mm1 36530188 36547201 -1.05469327277336
mm1 37874801 37890411 1.1818111527155
mm1 39551296 39577405 1.03024743095568
mm1 40465552 40500854 1.69797988062545

文件2:

mm1 17601901 17630939 -1.02477154457324
mm1 21511933 21513056 -1.01776484266642
mm1 23995939 24005656 -1.29725218483742
mm1 24612407 24612700 -1.5481572503361
mm1 24612775 24613119 -1.69044737891815

文件3:

mm1 21218575 21230167 -1.0792454238353
mm1 23995939 24005656 -1.38350179201041
mm1 24612407 24612700 -1.99368917819954
mm1 24612775 24613119 -1.27503764730879
mm1 36140027 36244720 1.15136681818451

在输出文件中,其他文件中未包含的行应标记为0

output.file

                           File 1   File 2  File 3
mm1 36307733    36324029    1.45947622984395    0   0
mm1 6530188     36547201   -1.05469327277336    0   0
mm1 37874801    37890411    1.1818111527155     0   0
mm1 39551296    39577405    1.03024743095568    0   0
mm1 40465552    40500854    1.69797988062545    0   0
mm1 17601901    17630939    0   -1.02477154457324   0
mm1 21511933    21513056    0   -1.01776484266642   0
mm1 23995939    24005656    -1.29725218483742   -1.38350179201041
mm1 24612407    24612700    -1.5481572503361    -1.99368917819954
mm1 24612775    24613119    0   0   -1.27503764730879
mm1 36140027    36244720    0   0    1.15136681818451

2 个答案:

答案 0 :(得分:2)

使用GNU bash,cut,GNU grep,GNU sort和column:

#!/bin/bash

cut -d " " -f 1-3 file1 file2 file3 | sort -u | while read -r line; do
  echo -n "$line: "
  (
    grep -Po "$line"'\K.*' file1 || echo " 0 "
    grep -Po "$line"'\K.*' file2 || echo " 0 "
    grep -Po "$line"'\K.*' file3 || echo " 0 " 
  ) | tr -d '\n'
  echo
done | column -t

输出:

mm1  17601901  17630939  0                  -1.02477154457324  0
mm1  21218575  21230167  0                  0                  -1.0792454238353
mm1  21511933  21513056  0                  -1.01776484266642  0
mm1  23995939  24005656  0                  -1.29725218483742  -1.38350179201041
mm1  24612407  24612700  0                  -1.5481572503361   -1.99368917819954
mm1  24612775  24613119  0                  -1.69044737891815  -1.27503764730879
mm1  36140027  36244720  0                  0                  1.15136681818451
mm1  36307733  36324029  1.45947622984395   0                  0
mm1  36530188  36547201  -1.05469327277336  0                  0
mm1  37874801  37890411  1.1818111527155    0                  0
mm1  39551296  39577405  1.03024743095568   0                  0
mm1  40465552  40500854  1.69797988062545   0                  0

PS:这实际上是awk的工作。

答案 1 :(得分:1)

awk救援!

这是一个没有固定到三个文件的awk解决方案

$ awk 'FNR==1{c++} {k=$1 FS $2 FS $3; keys[k]; a[k,c]=$4} 
                END{for(key in keys) 
                      {printf "%s", key FS; 
                       for(i=1;i<=c;i++) 
                           printf "%s", (a[key,i]?a[key,i]:0) FS; 
                       print ""}}' file{1,2,3} | 
  sort | column -t

mm1  17601901  17630939  0                  -1.02477154457324  0
mm1  21218575  21230167  0                  0                  -1.0792454238353
mm1  21511933  21513056  0                  -1.01776484266642  0
mm1  23995939  24005656  0                  -1.29725218483742  -1.38350179201041
mm1  24612407  24612700  0                  -1.5481572503361   -1.99368917819954
mm1  24612775  24613119  0                  -1.69044737891815  -1.27503764730879
mm1  36140027  36244720  0                  0                  1.15136681818451
mm1  36307733  36324029  1.45947622984395   0                  0
mm1  36530188  36547201  -1.05469327277336  0                  0
mm1  37874801  37890411  1.1818111527155    0                  0
mm1  39551296  39577405  1.03024743095568   0                  0
mm1  40465552  40500854  1.69797988062545   0                  0

不需要数组或数组(真正的多维数组)。