我正在尝试使用awk使用一个隔离ID以及一个从1到n的contig编号在fasta文件中重命名contig。
Fastafile:
>NODE_1_length_172477_cov_46.1343
GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA
隔离ID是一个变量,因为我想对多个文件执行此操作。我可以打印出isolateIDnumber,但是我需要> isolateID_number
for file in /dir/*.fasta
do
name=$(basename "$file" .fasta)
awk '/^>/{print "'"$name"'" ++i; next}{print}' $file > rename.fasta
done;
这给了我
15AR07771
GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA
所需的输出:
>15AR0777_1
GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA
问题是,我应该在哪里放置字符串,以便它将显示> 15AR0777_1而不是15AR07771
我尝试了以下几种变体,但没有奏效
awk '/^>/{print ">'"$name"'" "_" ++i; next}{print}' $file > rename.fasta
awk '/^>/{print ">'"$name"'" _++i; next}{print}' $file > rename.fasta
谢谢!
答案 0 :(得分:4)
使用onHandleIntent()
将Shell变量传输到awk脚本中。 awk -v awk_var="$bash_bar"
man awk:
即:
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the program begins. Such variable values are available to the
BEGIN rule of an AWK program.
这里是全awk版本:
for file in dir/*.fasta
do
name=$(basename "$file" .fasta)
awk -v name="$name" '/^>/{print ">" name "_" ++i; next}{print}' $file > rename.fasta
done
如果有文件awk '
FNR==1 { # new file, close old and make name for new
close(f) # close the old output file
n=FILENAME # get filename of the new file
gsub(/^.*\/|\.fasta$/,"",n) # remove path and .fasta
f="rename_" n ".fasta" # new output file
}
/^>/ {
$0=">" n "_" ++i # >name_number
}
{
print > f # print to output file
}' dir/*.fasta # process .fasta files in dir
,脚本将生成其中的文件dir/15AR07771.fasta
。 (您的版本将所有输出文件写入./rename_15AR07771.fasta
,甚至没有追加,您可能要修复它。)