我到处搜寻,但我仍然没有找到我正在寻找的答案。我有以下pdb文件(file1):
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 39.55
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 40.83
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 40.24
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 40.08
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 41.46
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 44.54
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 39.92
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 38.97
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 38.40
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 38.79
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 39.67
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 38.83
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 38.83
使用gfortran(file2)进行一些计算后,我也有以下文件:
1 0.14364205034979632
2 0.50527753403393372
我想做的是,只要file1的第6列等于file2的第1列,就将file1的第11列替换为file2的第2列。基本上,输出应该是这样的:
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
我有以下代码:
gawk '
FNR==NR { pdb[NR]=$0; next }
{
split(pdb[FNR],flds,FS,seps)
while ( flds[6]==$1 ) {
flds[11]=$2
for (i=1;i in flds;i++)
printf "%s%s", flds[i], seps[i]
print ""
}
}
' "file1" "file2" > "output.pdb"
它可以完成第一行file1的工作,并保持间距一致。问题是它没有进入下一行,第一行也是永久重复的。有人可以帮助我吗?
谢谢!我会给你一些啤酒:)
答案 0 :(得分:1)
我假设file1按列6排序。
join -1 6 -2 1 file1 file2 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10,2.2 | column -t
输出:
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632 ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632 ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632 ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632 ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632 ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632 ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372 ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372 ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372 ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372 ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372 ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372 ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
<强>更新强>:
使用bash的printf:
printf "%s %6.d %-3s %s %s %s %s %s %s %s %s\n" $(join -1 6 -2 1 file1 file2 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10,2.2)
输出:
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632 ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632 ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632 ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632 ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632 ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632 ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372 ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372 ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372 ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372 ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372 ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372 ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
答案 1 :(得分:1)
这是一个非常常见的问题,我很惊讶你无法找到解决方案:
$ awk 'NR==FNR{a[$1]=$2;next} {$11=a[$6]} 1' file2 file1
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
如果您关心保留空白区域:
$ awk 'NR==FNR{a[$1]=$2;next} {sub(/[^[:space:]]+[[:space:]]*$/,a[$6])} 1' file2 file1
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
答案 2 :(得分:0)
此解决方案特定于gawk(请参阅Defining Fields by Content),并假设file2有两列用单个空格分隔,以根据需要输出
awk 'BEGIN {FPAT = "([[:space:]]*[[:alnum:][:punct:][:digit:]]+)"; OFS = "";} FNR==NR{a[$1]=$2; next} {$11=a[$6+0]} {print}' file2 file1
{$11=a[$6+0]}
以便$6
的值如“1”和“2”将匹配数组a
中的值,如数字上下文中的“1”和“2”而不是字符串比较(感谢@Ed Morton的解释)参考文献: