我有一个file2
,$5
之前的值为{1400} -
,其中"未知"。我要做的是使用$2
file2
中的文字来更新那些"未知" file1
中的值。在$1
的{{1}}中,有一组数字可用于更新"未知"如果它在file1
的{{1}}范围内。我真的不知道从哪里开始,但也许下面的$4
是一个开始,或者可能有更好的方法。谢谢你:)。
file1
file2
file2的
awk
所需的输出( `$1` `$2`
chr6:3224495-3227968 TUBB2B
chr16:89988417-90002505 TUBB3
)。
chr16 89985657 89986630 chr16:89985657-89986630 MC1R-2270|gc=63.5
chr16 89989779 89989898 chr16:89989779-89989898 unknown-2271|gc=73.9
chr16 89998969 89999097 chr16:89998969-89999097 unknown-2272|gc=57
chr16 89999866 89999996 chr16:89999866-89999996 unknown-2273|gc=55.4
chr16 90001127 90002222 chr16:90001127-90002222 unknown-2274|gc=63.9
chr17 1173848 1174575 chr17:1173848-1174575 BHLHA9-3|gc=78.7
AWK
unknown updated to TUBB3 because the TUBB3 because the $4 value is within the range of $1
编辑:
chr16 89985657 89986630 chr16:89985657-89986630 MC1R-2270|gc=63.5
chr16 89989779 89989898 chr16:89989779-89989898 TUBB3-2271|gc=73.9
chr16 89998969 89999097 chr16:89998969-89999097 TUBB3-2272|gc=57
chr16 89999866 89999996 chr16:89999866-89999996 TUBB3-2273|gc=55.4
chr16 90001127 90002222 chr16:90001127-90002222 TUBB3-2274|gc=63.9
chr17 1173848 1174575 chr17:1173848-1174575 BHLHA9-3|gc=78.7
答案 0 :(得分:2)
awk
救援!
$ awk -v OFS='\t' 'NR==FNR{split($1,a,/[:-]/)
rstart[a[1]]=a[2]
rend[a[1]]=a[3]
value[a[1]]=$2
next}
$5~/unknown/ && $2>=rstart[$1] && $3<=rend[$1]
{sub(/unknown/,value[$1],$5)}1' file1 file2 |
column -t
chr16 89985657 89986630 chr16:89985657-89986630 MC1R-2270|gc=63.5
chr16 89989779 89989898 chr16:89989779-89989898 TUBB3-2271|gc=73.9
chr16 89998969 89999097 chr16:89998969-89999097 TUBB3-2272|gc=57
chr16 89999866 89999996 chr16:89999866-89999996 TUBB3-2273|gc=55.4
chr16 90001127 90002222 chr16:90001127-90002222 TUBB3-2274|gc=63.9
chr17 1173848 1174575 chr17:1173848-1174575 BHLHA9-3|gc=78.7
修改原始间距,以便以表格格式传送到column -t
。