我有一个文件,我需要根据另一个文件中给出的字符范围提取片段。我想用awk命令来做。
文件一个看起来像这样(单行):
AATTGTGAAGGTAGATGGCTCGCTCCGCGGCGGGGCGCGCGCGCGCGCGCGGGCTCGCTATATAGAGATATATGCGCGCGGCGCGCGGCGCGCGCGGCGCGCGCGTATATATATAGGCGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCCCCCCCCCCC
第二个文件如下所示:
5 10
13 20
22 24
,输出为:
GTGAAG
AGATGGCT
GCT
答案 0 :(得分:3)
这个单行将解决您的问题:
awk 'BEGIN{getline sequence < "first_file"} {print substr(sequence, $1, $2 - $1 + 1) }' second_file
说明:此脚本使用sequence
函数从名为first_file
的文件中读取字符串getline
(将其调整为实际文件名)。然后,对于第二个文件的每一行(包含处理范围),它使用substr
函数提取必要的子字符串。 substr
接受三个参数:字符串(sequence
),位置($1
)和长度($2 - $1 + 1
)。
答案 1 :(得分:1)
Nya 为您提供了awk
解决方案,此处是基于coreutils
的解决方案。
<强>字符串强>
AATTGTGAAGGTAGATGGCTCGCTCCGCGGCGGGGCGCGCGCGCGCGCGCGGGCTCGCTATATAGAGATATATGCGCGCGGCGCGCGGCGCGCGCGGCGCGCGCGTATATATATAGGCGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCCCCCCCCCCC
<强> offlen 强>
5 10
13 20
22 24
您可以获得所需的输出:
while read off len; do cut -c${off}-${len} string; done < offlen
输出:
GTGAAG
AGATGGCT
GCT