我正在寻找一种基于前两行合并某些文件的方法。但是我不想排除uniq值,我希望它们能够保留,以便我可以制作最终的独特文件。
我尝试使用此命令:
join <(sort 1) <(sort 2) | tac | awk '{print $1,$3,$4,$5,$2}' | column -t
但它只合并并输出匹配的东西。
例如,文件1的这个数据:
mm1 36307733 36324029 1.45947622984395
mm1 36530188 36547201 -1.05469327277336
mm1 37874801 37890411 1.1818111527155
mm1 39551296 39577405 1.03024743095568
mm1 40465552 40500854 1.69797988062545
文件2:
mm1 17601901 17630939 -1.02477154457324
mm1 21511933 21513056 -1.01776484266642
mm1 23995939 24005656 -1.29725218483742
mm1 24612407 24612700 -1.5481572503361
mm1 24612775 24613119 -1.69044737891815
文件3:
mm1 21218575 21230167 -1.0792454238353
mm1 23995939 24005656 -1.38350179201041
mm1 24612407 24612700 -1.99368917819954
mm1 24612775 24613119 -1.27503764730879
mm1 36140027 36244720 1.15136681818451
在输出文件中,其他文件中未包含的行应标记为0
output.file
File 1 File 2 File 3
mm1 36307733 36324029 1.45947622984395 0 0
mm1 6530188 36547201 -1.05469327277336 0 0
mm1 37874801 37890411 1.1818111527155 0 0
mm1 39551296 39577405 1.03024743095568 0 0
mm1 40465552 40500854 1.69797988062545 0 0
mm1 17601901 17630939 0 -1.02477154457324 0
mm1 21511933 21513056 0 -1.01776484266642 0
mm1 23995939 24005656 -1.29725218483742 -1.38350179201041
mm1 24612407 24612700 -1.5481572503361 -1.99368917819954
mm1 24612775 24613119 0 0 -1.27503764730879
mm1 36140027 36244720 0 0 1.15136681818451
答案 0 :(得分:2)
使用GNU bash,cut,GNU grep,GNU sort和column:
#!/bin/bash
cut -d " " -f 1-3 file1 file2 file3 | sort -u | while read -r line; do
echo -n "$line: "
(
grep -Po "$line"'\K.*' file1 || echo " 0 "
grep -Po "$line"'\K.*' file2 || echo " 0 "
grep -Po "$line"'\K.*' file3 || echo " 0 "
) | tr -d '\n'
echo
done | column -t
输出:
mm1 17601901 17630939 0 -1.02477154457324 0 mm1 21218575 21230167 0 0 -1.0792454238353 mm1 21511933 21513056 0 -1.01776484266642 0 mm1 23995939 24005656 0 -1.29725218483742 -1.38350179201041 mm1 24612407 24612700 0 -1.5481572503361 -1.99368917819954 mm1 24612775 24613119 0 -1.69044737891815 -1.27503764730879 mm1 36140027 36244720 0 0 1.15136681818451 mm1 36307733 36324029 1.45947622984395 0 0 mm1 36530188 36547201 -1.05469327277336 0 0 mm1 37874801 37890411 1.1818111527155 0 0 mm1 39551296 39577405 1.03024743095568 0 0 mm1 40465552 40500854 1.69797988062545 0 0
PS:这实际上是awk的工作。
答案 1 :(得分:1)
awk
救援!
这是一个没有固定到三个文件的awk解决方案
$ awk 'FNR==1{c++} {k=$1 FS $2 FS $3; keys[k]; a[k,c]=$4}
END{for(key in keys)
{printf "%s", key FS;
for(i=1;i<=c;i++)
printf "%s", (a[key,i]?a[key,i]:0) FS;
print ""}}' file{1,2,3} |
sort | column -t
mm1 17601901 17630939 0 -1.02477154457324 0
mm1 21218575 21230167 0 0 -1.0792454238353
mm1 21511933 21513056 0 -1.01776484266642 0
mm1 23995939 24005656 0 -1.29725218483742 -1.38350179201041
mm1 24612407 24612700 0 -1.5481572503361 -1.99368917819954
mm1 24612775 24613119 0 -1.69044737891815 -1.27503764730879
mm1 36140027 36244720 0 0 1.15136681818451
mm1 36307733 36324029 1.45947622984395 0 0
mm1 36530188 36547201 -1.05469327277336 0 0
mm1 37874801 37890411 1.1818111527155 0 0
mm1 39551296 39577405 1.03024743095568 0 0
mm1 40465552 40500854 1.69797988062545 0 0
不需要数组或数组(真正的多维数组)。