我有一个小的示例数据集test1.faa
>PROKKA_00001_A1@hypothetical@protein
MTIALHLTAVLAFAALAGCGANDSDPGPGGVTVSEARALDQAAEMLEKRGRSPADENAEQAERLRREQAQARTPGQPPEQALQQDGASAPE
>PROKKA_00002_A1@Cystathionine@beta-lyase
MHRFGGMVTAILKGGLDDARRFLERCELFALAESLGGVESLIEHPAIMTHASVPREIREALGISDGLVRLSVGIEDADDLLAELETALA
>PROKKA_00003_A1@hypothetical@protein
MVPIVSAAPVFTLLLTVAVFRRERLTAGRIAAVAVVVPSVILIALGH
,我想将以下行的长度添加到标题行,然后添加下一行,例如
>PROKKA_00001_A1@hypothetical@protein_92
MTIALHLTAVLAFAALAGCGANDSDPGPGGVTVSEARALDQAAEMLEKRGRSPADENAEQAERLRREQAQARTPGQPPEQALQQDGASAPE
我尝试使用awk进行此操作,但它返回以下错误:
awk: >PROKKA_00001_A1@hypothetical@protein: No such file or directory
我假设它在一开始就与>
有关?但是我需要在输出文件中。
这是我尝试的代码:
#!/bin/bash
cat test1.faa | while read line
do
headerline=$(awk '/>/{print $0}' $line)
echo -e "this is the headerline \n ${headerline}"
fastaline=$(awk '!/>/{print $0}' $line)
echo -e "this is the fastaline \n ${fastaline}"
fastaline_length=$(awk -v linelength=$fastaline '{print length(linelength)}')
echo -e "this is length of fastaline \n ${fastaline_length}"
echo "${headerline}_${fastaline_length}"
echo $fastaline
done
有关如何执行此操作的任何建议?
答案 0 :(得分:3)
请尝试以下操作(考虑到您的实际Input_file与所示示例相同)。
awk '/^>/{value=$0;next} {print value"_"length($0) ORS $0;value=""}' Input_file
答案 1 :(得分:1)
此awk命令将完成您想要的
Col1 | Col2 | Col3 | Col4
------------------------------------
Row1C1 | Row1C2 | Row1C3 | Row1C4
Row2C1 | Row2C2 | Row2C3 | Row2C4
Added static row here with colspan
Row3C1 | Row3C2 | Row3C3 | Row3C4
Row4C1 | Row4C2 | Row4C3 | Row4C4