问题: 我试图在文件(种类)中找到多个特定行,然后在每个物种名称后仅打印第5行到新文件。我可以单独为每个物种做到这一点,但是我无法通过循环来浏览文档中的每一个物种。 例如:
awk 'c&&!--c;/species_1$/{c=5}' results.out > speciesnames
如何将此命令转换为循环以便执行以下操作(迭代文件中的每个物种):
物种1,打印第5行,标题为物种名称
物种2,打印第5行,标题为物种名称
物种n,将第5行打印到标题为物种名称的文件
任何帮助将不胜感激。我对循环的经验很少。 感谢
results.out中的数据结构示例:
Query= species_1
length=341
Score
bits
Line 5, relevant info
description
description
description
description
description
description
description
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
data
data
data
data
data
data
Query= species_2
length=341
.......
所需的输出到文件物种名称:
Line 5, relevant info for species 1
Line 5, relevant info for species 2
Line 5, relevant info for species n
答案 0 :(得分:1)
Meybe有点像这样:
awk 'c&&!--c;/species_[0-9]+$/{c=5}' file
awk '/species_[0-9]+/{a[NR+5]} {b[NR]=$0} END {for (i in a) print b[i]}' file
这会在species
点击后打印所有第5行
<{1}}输出中array
的性质是随机的。
新输入后调整代码:
awk
awk 'c&&!--c;/species [0-9]+$/{c=4}' file
Line 5, relevent info
和数字之间没有_
,而是一个空格
你喜欢点击后的species
行,而不是行4
示例数据:
5
cat file
Query= species 1
length=341
Score
bits
Line 5, relevent info
description
description
description
description
description
description
Query= species 5
length=341
Score
bits
Line 5, relevent info need this
description
description
description
description
description
Query= species 8
length=341
Score
bits
Line 5, relevent info more data
description
description
description
description
description
Query= species 6423
length=341
Score
bits
Line 5, relevent infom, yes here it is
description
description
description
description
description
最终解决方案:
awk 'c&&!--c {print i " --> " $0} /species [0-9]+$/{c=4;i=$2 FS $3}' file
species 1 --> Line 5, relevent info
species 5 --> Line 5, relevent info need this
species 8 --> Line 5, relevent info more data
species 6423 --> Line 5, relevent infom, yes here it is
答案 1 :(得分:0)
使用getline函数的方法
awk '/^Query *= *species_[0-9]/{print $0":";for(i=1;i<=5;++i){if(getline>0 &&i==5){print}}}' file
启动循环并从匹配Query *= *species_[0-90]/
for(i=1;i<=5;++i)
到达第5行后打印
{if(getline>0 &&i==5){print}}}'
具有
的示例文件Query= species_1
length=341
Score
bits
Line 5, relevant info
description
description
data
data
data
data
data
data
Query= species_2
length=341
Score
bits
Line 5, relevant info
description
description
data
data
data
data
data
data
结果
Query= species_1:
Line 5, relevant info
Query= species_2:
Line 5, relevant info
答案 2 :(得分:0)
你可以做点什么吗
linenr=0
species=unknown
cat results.out | while read -r line; do
if [[ "${line}" = Query* ]]; then
linenr=0
species=$(echo ${line} | cut -d= -f2)
else
(( linenr = linenr + 1 ))
if [ ${linenr} -eq 5 ]; then
echo ${line} > ${species}.out
fi
fi
done