我有一个具有以下结构的文件(请参见下文),我需要帮助来找到匹配每个“> Cluster”字符串的方法,并针对每种情况计算直到下一个“>” Cluster“的行数,以此类推直到文件末尾。
>Cluster 0
0 10565nt, >CL9602.Contig1_All... *
1 1331nt, >CL9602.Contig2_All... at -/98.05%
>Cluster 1
0 3798nt, >CL3196.Contig1_All... at +/97.63%
1 9084nt, >CL3196.Contig3_All... *
>Cluster 2
0 8710nt, >Unigene21841_All... *
>Cluster 3
0 8457nt, >Unigene10299_All... *
所需的输出应如下所示:
Cluster 0 2
Cluster 1 2
Cluster 2 1
Cluster 3 1
我尝试使用awk进行如下操作,但是它只给我行号。
awk '{print FNR "\t" $0}' All-Unigene_Clustered.fa.clstr | head - 20
==> standard input <==
1 >Cluster 0
2 0 10565nt, >CL9602.Contig1_All... *
3 1 1331nt, >CL9602.Contig2_All... at -/98.05%
4 >Cluster 1
5 0 3798nt, >CL3196.Contig1_All... at +/97.63%
6 1 9084nt, >CL3196.Contig3_All... *
7 >Cluster 2
8 0 8710nt, >Unigene21841_All... *
9 >Cluster 3
10 0 8457nt, >Unigene10299_All... *
我也尝试使用sed,但它只打印行,甚至省略了一些行。
sed -n -e '/>Cluster/,/>Cluster/ p' All-Unigene_Clustered.fa.clstr | head
>Cluster 0
0 10565nt, >CL9602.Contig1_All... *
1 1331nt, >CL9602.Contig2_All... at -/98.05%
>Cluster 1
>Cluster 2
0 8710nt, >Unigene21841_All... *
>Cluster 3
>Cluster 4
0 1518nt, >CL2313.Contig1_All... at -/95.13%
1 8323nt, >CL2313.Contig8_All... *
此外,我尝试了awk并将sed与'wc'结合使用,但是它只给我提供了字符串匹配的总发生次数。
我想使用grep的-v选项减去不匹配字符串'> cluster'的行,然后减去匹配字符串'> Cluster'的每一行,并将两者都添加到新文件中,例如
grep -vw '>Cluster' All-Unigene_Clustered.fa.clstr | head
0 10565nt, >CL9602.Contig1_All... *
1 1331nt, >CL9602.Contig2_All... at -/98.05%
0 3798nt, >CL3196.Contig1_All... at +/97.63%
1 9084nt, >CL3196.Contig3_All... *
0 8710nt, >Unigene21841_All... *
0 8457nt, >Unigene10299_All... *
0 1518nt, >CL2313.Contig1_All... at -/95.13%
grep -w '>Cluster' All-Unigene_Clustered.fa.clstr | head
>Cluster 0
>Cluster 1
>Cluster 2
>Cluster 3
>Cluster 4
,但是问题是每个'> Cluster'之后的行数不是恒定的,每个'> Cluster'字符串后跟1、2、3或更多行,直到出现下一个字符串。
在广泛寻求以前提出的问题的帮助之后,我决定发布我的问题,但是我找不到任何有用的答案。
谢谢
答案 0 :(得分:2)
使用GNU awk进行多字符RS:
$ awk -v RS='(^|\n)(>|$)' -F'\n' 'NR>1{print $1, NF-1}' file
Cluster 0 2
Cluster 1 2
Cluster 2 1
Cluster 3 1
上面的代码只是将输入分成几行,每行以>
开头,然后打印每条记录的行数(>Cluster...
行减去1)。
答案 1 :(得分:1)
请您尝试以下。
<div>
Basic
{this.props.listOfItems.filter(item => item.fields.category ==="Hardware").map(item => (
<Product
id={item.fields.productexternalid}
name={item.fields.productname}
category={item.fields.SKYDE_Product_Category__c}
clicked={() => this.addToCart(item)}
costOneTime={item.fields.baseonetimefee}
costRecurring={item.fields.baserecurringcharge}
eligible={item.fields.eligible}
visible={item.fields.visible}
></Product>
))}
</div>
说明: 添加上述代码的说明。
awk '
/^>Cluster/{
if(count){
print prev,count
}
sub(/^>/,"")
prev=$0
count=""
next
}
{
count++
}
END{
if(count && prev){
print prev,count
}
}
' Input_file
输出如下。
awk ' ##Starting awk program from here.
/^>Cluster/{ ##Checking condition if a line is having string Cluster then do following.
if(count){ ##Checking condition if variable count is NOT NULL then do following.
print prev,count ##Printing prev and count variable here.
} ##Closing BLOCK for if condition here.
sub(/^>/,"") ##Using sub for substitution of starting > with NULL in current line.
prev=$0 ##Creating a variable named prev whose value is current line.
count="" ##Nullifying count variable here.
next ##next will skip all further statements from here.
} ##Closing BLOCK for Cluster condition here.
{
count++ ##Doing increment of variable count each time cursor comes here.
}
END{ ##Mentioning END BLOCK for this program.
if(count && prev){ ##Checking condition if variable count and prev are NOT NULL then do following.
print prev,count ##Printing prev and count variable here.
} ##Closing BLOCK for if condition here.
} ##Closing BLOCK for END BLOCK of this program.
' Input_file ##Mentioning Input_file name here.
答案 2 :(得分:1)
这里是{,尽管Perl
中一个非常冗长的衬里。我真的不擅长打高尔夫球。
perl -n -e "if ( /^>(.+)/ ) { print qq($last, $count\n) if $count; $last = $1; $count = 0; } else { $count++ } END { print qq($last, $count) }" All-Unigene_Clustered.fa.clstr
这是针对Windows的。对于unix shell,您可能需要将双引号更改为单引号。
答案 3 :(得分:1)
在perl中,代码可以采用以下形式
use strict;
use warnings;
my $cluster;
my $count;
while( <DATA> ) {
chomp;
if( /Cluster \d+/ ) {
print "$cluster $count\n" if defined $cluster;
s/>//;
$cluster = $_;
$count = 0;
} else {
$count++;
}
}
print "$cluster $count\n" if defined $store;
__DATA__
>Cluster 0
0 10565nt, >CL9602.Contig1_All... *
1 1331nt, >CL9602.Contig2_All... at -/98.05%
>Cluster 1
0 3798nt, >CL3196.Contig1_All... at +/97.63%
1 9084nt, >CL3196.Contig3_All... *
>Cluster 2
0 8710nt, >Unigene21841_All... *
>Cluster 3
0 8457nt, >Unigene10299_All... *
输出
Cluster 0 2
Cluster 1 2
Cluster 2 1
Cluster 3 1