使用awk从文件中提取段

时间:2012-08-22 18:54:02

标签: awk

我有一个文件,我需要根据另一个文件中给出的字符范围提取片段。我想用awk命令来做。

文件一个看起来像这样(单行):

AATTGTGAAGGTAGATGGCTCGCTCCGCGGCGGGGCGCGCGCGCGCGCGCGGGCTCGCTATATAGAGATATATGCGCGCGGCGCGCGGCGCGCGCGGCGCGCGCGTATATATATAGGCGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCCCCCCCCCCC

第二个文件如下所示:

5 10
13 20
22 24

,输出为:

GTGAAG
AGATGGCT
GCT

2 个答案:

答案 0 :(得分:3)

这个单行将解决您的问题:

awk 'BEGIN{getline sequence < "first_file"} {print substr(sequence, $1, $2 - $1 + 1) }' second_file

说明:此脚本使用sequence函数从名为first_file的文件中读取字符串getline(将其调整为实际文件名)。然后,对于第二个文件的每一行(包含处理范围),它使用substr函数提取必要的子字符串。 substr接受三个参数:字符串(sequence),位置($1)和长度($2 - $1 + 1)。

答案 1 :(得分:1)

Nya 为您提供了awk解决方案,此处是基于coreutils的解决方案。

<强>字符串

AATTGTGAAGGTAGATGGCTCGCTCCGCGGCGGGGCGCGCGCGCGCGCGCGGGCTCGCTATATAGAGATATATGCGCGCGGCGCGCGGCGCGCGCGGCGCGCGCGTATATATATAGGCGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCCCCCCCCCCC

<强> offlen

5 10
13 20
22 24

您可以获得所需的输出:

while read off len; do cut -c${off}-${len} string; done < offlen

输出:

GTGAAG
AGATGGCT
GCT