我有一个包含这样的列的文件:
TNFRSF14 chr1 2487803,2489164,2489781,2491261,2492062,2493111,2494303,2494586, 2488172,2489273,2489907,2491417,2492153,2493254,2494335,2497061,
ID3 chr1 23884420,23885425,23885617, 23884906,23885510,23886285,
如果您的浏览器无法看到标签:
TNFRSF14"\t"chr1"\t"2487803,2489164,2489781,2491261,2492062,2493111,2494303,2494586,"\t"2488172,2489273,2489907,2491417,2492153,2493254,2494335,2497061,
ID3"\t"chr1"\t"23884420,23885425,23885617,"\t"23884906,23885510,23886285,
我想让输出说:
TNFRSF14 chr1 2487803 2488172
TNFRSF14 chr1 2489164 2489273
...
ID3 chr1 23885425 23885510
ID3 chr1 23885617 23886285
正如您所看到的,我的原始输入在第3列和第4列中具有不同的长度,但第3列的长度将始终等于第4列。到目前为止,我已经能够将文件拆分为不同的列长度,并且一个可以放置它们的python脚本。我希望awk有办法做到这一点!
感谢您的任何建议!
答案 0 :(得分:3)
您可以尝试使用split function
gawk '{
split($3,a,",");
split($4,b,",");
for(i=1; i<length(a); i++){
print $1, $2, a[i], b[i];
}
}' input
注意:length(数组)是特定于gnu-awk的
你得到:
TNFRSF14 chr1 2487803 2488172 TNFRSF14 chr1 2489164 2489273 TNFRSF14 chr1 2489781 2489907 TNFRSF14 chr1 2491261 2491417 TNFRSF14 chr1 2492062 2492153 TNFRSF14 chr1 2493111 2493254 TNFRSF14 chr1 2494303 2494335 TNFRSF14 chr1 2494586 2497061 ID3 chr1 23884420 23884906 ID3 chr1 23885425 23885510 ID3 chr1 23885617 23886285
答案 1 :(得分:2)
$ cat tst.awk
BEGIN{ FS=OFS="\t" }
{
n = split($3,a,/,/)
split($4,b,/,/)
for (i=1;i<n;i++) {
print $1, $2, a[i], b[i]
}
}
$
$ awk -f tst.awk file
TNFRSF14 chr1 2487803 2488172
TNFRSF14 chr1 2489164 2489273
TNFRSF14 chr1 2489781 2489907
TNFRSF14 chr1 2491261 2491417
TNFRSF14 chr1 2492062 2492153
TNFRSF14 chr1 2493111 2493254
TNFRSF14 chr1 2494303 2494335
TNFRSF14 chr1 2494586 2497061
ID3 chr1 23884420 23884906
ID3 chr1 23885425 23885510
ID3 chr1 23885617 23886285
答案 2 :(得分:1)
awk -F',? ' '
{
split($3, a, /,/)
split($4, b, /,/)
for (i in a) print $1, $2, a[i], b[i]
}' file