我有一个看起来像这样的文本文件。我想提取“A”和“E”字符的总数。
>pr1
FSVSQNNPAE
>pr2
MAKERAHSQ
>pr3
RRRDKINNWIVQL
我想得到像这样的输出
>pr1
Total number of A - 1
Total number of E - 1
>pr2
Total number of A – 2
Total number of E - 1
>pr3
Total number of A – 0
Total number of E – 0
我怎么能用awk做到这一点?
答案 0 :(得分:4)
一种方式。找到以>
开头的行时,请阅读下一行,将其保存在str
变量中,并计算每个字母的替换次数。
awk '
$1 ~ /^>/ {
getline str
num_a = gsub( /A/, "", str )
num_e = gsub( /E/, "", str )
printf "%s\nTotal number of A - %d\nTotal number of E - %d\n\n", $0, num_a, num_e
}
' infile
输出:
>pr1
Total number of A - 1
Total number of E - 1
>pr2
Total number of A - 2
Total number of E - 1
>pr3
Total number of A - 0
Total number of E - 0
答案 1 :(得分:3)
更新:这可以通过动态更改FS
(field seperator)来实现:
{
if ($0 ~ /^>/)
printf("\n%s\n", $0);
else
{
FS="A"
nl = $0;
$0 = nl;
print "Total number of A - ", NF-1;
FS="E"
$0 = nl;
print "Total number of E - ", NF-1;
}
}
给出:
>pr1
Total number of A - 1
Total number of E - 1
>pr2
Total number of A - 2
Total number of E - 1
>pr3
Total number of A - 0
Total number of E - 0
以前的解决方案:
{
if ($1 ~ /^>/)
printf("\n%s\n", $0)
else
{
print "total number of A - ", gsub(/A/,"A")
print "total number of E - ", gsub(/E/,"E")
}
}
类似于@Birei的