Question

是否有一个内置命令来执行此操作，或者有任何人对运行它的脚本有任何好运？

我希望了解有多少行有一个特定字符的出现次数。（按出现次数降序排列）

例如，使用此示例文件：

gkdjpgfdpgdp
fdkj
pgdppp
ppp
gfjkl

建议输入（对于'p'字符）

bash / perl some_script_name“p”samplefile

期望的输出：

occs     count
4          1
3          2
0          2

更新：你会如何编写一个解决2字符串的解决方案，例如'gd'而不仅仅是p等特定字符？

Answer 1

$ sed 's/[^p]//g' input.txt | awk '{print length}' | sort -nr | uniq -c | awk 'BEGIN{print "occs", "count"}{print $2,$1}' | column -t
occs  count
4     1
3     2
0     2

Answer 2

您可以将所需的字符作为awk的字段分隔符，并执行以下操作：

awk -F 'p' '{ print NF-1 }' | 
  sort -k1nr | 
    uniq -c | 
      awk -v OFS="\t" 'BEGIN { print "occs", "count" } { print $2, $1 }'

对于您的样本数据，它会产生：

occs    count
4       1
3       2
0       2

如果要计算多字符字符串的出现次数，只需将所需的字符串作为分隔符，例如awk -F 'gd' ...或awk -F 'pp' ...。

Answer 3

#!/usr/bin/env perl

use strict; use warnings;

my $seq = shift @ARGV;
die unless defined $seq;

my %freq;

while ( my $line = <> ) {
    last unless $line =~ /\S/;        
    my $occurances = () = $line =~ /(\Q$seq\E)/g;
    $freq{ $occurances } += 1;
}

for my $occurances ( sort { $b <=> $a} keys %freq ) {
    print "$occurances:\t$freq{$occurances}\n";
}

如果你想做空，你可以随时使用：

#!/usr/bin/env perl
$x=shift;/\S/&&++$f{$a=()=/(\Q$x\E)/g}while<>
;print"$_:\t$f{$_}\n"for sort{$b<=>$a}keys%f;

或，perl -e '$x=shift;/\S/&&++$f{$a=()=/(\Q$x\E)/g}while<>;print"$_:\t$f{$_}\n"for sort{$b<=>$a}keys%f' inputfile，但现在我变得很傻。

Answer 4

Pure Bash：

declare -a count

while read ; do
  cnt=${REPLY//[^p]/}               # remove non-p characters
  ((count[${#cnt}]++))              # use length as array index
done < "$infile"

for idx in ${!count[*]}             # iterate over existing indices
do echo -e "$idx ${count[idx]}"
done | sort -nr

根据需要输出：

4 1
3 2
0 2

Answer 5

可以在一个gawk过程中（好吧，使用排序协处理）

gawk -F p -v OFS='\t' '
    { count[NF-1]++ }
    END {
        print "occs", "count"
        coproc = "sort -rn"
        for (n in count)
            print n, count[n] |& coproc
        close(coproc, "to")
        while ((coproc |& getline) > 0)
            print
        close(coproc)
    }
'

Answer 6

到目前为止最短的解决方案：

perl -nE'say tr/p//' | sort -nr | uniq -c |
   awk 'BEGIN{print "occs","count"}{print $2,$1}' |
      column -t

对于多个字符，请使用正则表达式：

perl -ple'$_ = () = /pg/g' | sort -nr | uniq -c |
   awk 'BEGIN{print "occs","count"}{print $2,$1}' |
      column -t

这个处理重叠匹配（例如，它在“pppp”中找到3“pp”而不是2）：

perl -ple'$_ = () = /(?=pp)/g' | sort -nr | uniq -c |
   awk 'BEGIN{print "occs","count"}{print $2,$1}' |
      column -t

原始神秘但短暂的纯Perl版本：

perl -nE'
   ++$c{ () = /pg/g };
}{
   say "occs\tcount";
   say "$_\t$c{$_}" for sort { $b <=> $a } keys %c;
'

unix - 有多少行与字符出现次数的细分

6 个答案: