我有两种文件类型,例如: 文件类型1(Hsrr610_mult_notab.ko):
K00002 2023649
K00002 2643896
K00006 1614154
K00006 600734
K00008 1562227
K00012 1353687
文件类型2(Hsrr610.out(从多个ko _ *。ko文件中提取)):
K00002 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00006 ko00564,ko01110,ko04011
K00008 ko00040,ko00051,ko01100
K00012 ko00040,ko00053,ko00520,ko01100
这是我编写的脚本,用于检查第一个文件和第二个文件中是否存在公用的KXXXXX,将koXXXXX字符串(koXXXXX和逗号应假定为单个字符串)追加到第一个文件中,例如:
K00002 2023649 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00002 2643896 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00006 1614154 ko00564,ko01110,ko04011
K00006 600734 ko00564,ko01110,ko04011
K00008 1562227 ko00040,ko00051,ko01100
K00012 1353687 ko00040,ko00053,ko00520,ko01100
,但无法正常工作:
#!/usr/bin/bash
for i in ko_*.ko
do
r="$(echo $i | sed s/ko_// | sed s/.ko// )";
echo $(echo "$r " && cat $i | sed ':a;N;$!ba;s/\n/,/g' ) > $r.csvt
done
cat *.csvt > Hsrr610.out && rm *.csvt
for j in $(cat Hsrr610.out)
do
k="$(echo $j | grep "K[0-9]*" | sed s/\n/0/g | sed s/\t//g)"
l="$(echo $j | grep "ko*")"
echo $k
awk -v one="$k" -v two=" $j" '{if (/one/) {$0=$0 two}; print}' Hsrr610_mult_notab.ko > out
done
谢谢
答案 0 :(得分:2)
编辑: :由于OP更改了要求,因此请立即添加此解决方案。
* Athos
* Aramis
* Porthos
* D'Artagnan
遵循awk 'FNR==NR{a[$1]=$NF;next} {print $0,a[$1]}' Hsrr610.out Hsrr610_mult_notab.ko
可能会对您有所帮助。
awk
现在也添加一种非衬套形式的解决方案。
awk '!b[$1]++{c[++count]=$1} {a[$1]=a[$1]?a[$1] OFS $NF:$NF} END{for(i=1;i<=count;i++){print c[i] FS a[c[i]]}}' OFS="," Input_file
答案 1 :(得分:2)
如果键已排序,那么join
就是这样。
$ join file1 file2
K00002 2023649 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00002 2643896 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00006 1614154 ko00564,ko01110,ko04011
K00006 600734 ko00564,ko01110,ko04011
K00008 1562227 ko00040,ko00051,ko01100
K00012 1353687 ko00040,ko00053,ko00520,ko01100