Question

我有一个问题要解决，但我＆＃39;我自己无法做到。

包含ID＆＃39的文件1，如下所示：

>AIM49244.1
>NP_722551.1
>YP_002790883.1
>AGS41451.1
>AIM49245.1
>BAM74427.1
>CCC55433.1

文件2看起来像：

>AIM49244.1 polyprotein [Aedes flavivirus]
(several lines of text only Alphabetic)
>NZ_03930.3 polyprotein [please help]
(several lines of text only Alphabetic)
>NP_722551.1 polyprotein [Alkhumra hemorrhagic fever virus]
(several lines of text only Alphabetic)
>NP_123456.7 polyprotein [Foo bar Foo bar]
several lines of text
and so on

使用文件1中的ID，我想提取包含ID的标题和文件2中的以下文本行，直到下一个ID开始。

输出文件的示例结果：

>AIM49244.1 polyprotein [Aedes flavivirus]
(several lines of text only Alphabetic)
>NP_722551.1 polyprotein [Alkhumra hemorrhagic fever virus]
(several lines of text only Alphabetic)

我有一个解决方案但它只适用于文件2中的描述行（以＆gt;开头）后面只有一行。

awk 'FNR==NR{A[$1]=$1; next}$1 in A{print $0, getline; print $0}' File_1 File_2

但是我＆＃39;我不能擅长解决新问题。我尝试使用范围模式，但它无法正常工作。如果你能帮助我，那将是非常好的：）

Answer 1

请不要使用getline，除非您有非常具体的需求并完全理解所有含义和警告。请参阅http://awk.freeshell.org/AllAboutGetline。

在这种情况下，您只需要：

awk '
NR==FNR { ids[$1]; next }
/^>/ { inTargetBlock = ($1 in ids ? 1 : 0) }
inTargetBlock
' file1 file2

从另一个文件获取一个模式时，在两个模式之间打印行

1 个答案: