搜索和计算匹配线

时间:2017-03-25 18:53:35

标签: awk

您好我正在寻找解决此问题的方法。我有一个使用uniq -c排序的输出。代码看起来像这样find $DIR -type f | file -b $files | sort -n | uniq -c | sort -nr有人可以告诉我以什么方式访问前缀中的数字值? 我的输出是什么样的:

  21  ASCII text
  19  C source, ASCII text
  16  ASCII text
  10  ASCII text, with very long lines
  9   HTML document, UTF-8 Unicode text, with
  2   HTML document, ASCII text, with very lon
  1   C source, UTF-8 Unicode text

Exprected output:

    ASCII text                                 : 21 
    C source, ASCII text                       : 19
    ASCII text                                 : 16 
    ASCII text, with very long lines           : 10
    HTML document, UTF-8 Unicode text, with    : 9 
    HTML document, ASCII text, with very lon   : 2 
    C source, UTF-8 Unicode text               : 1

如何将文件类型之前的值保存到变量中?

2 个答案:

答案 0 :(得分:0)

awk救援!

... | awk '{k=$1;                   # save counts        
            sub(/[^ ]+ /,"",$0);    # remove the counts
            $1=$1;                  # normalize spaces
            print $0 "\t:",k}' |    # print in new order
      column -ts$'\t'               # align tabs

将打印

ASCII text                                : 21
C source, ASCII text                      : 19
ASCII text                                : 16
ASCII text, with very long lines          : 10
HTML document, UTF-8 Unicode text, with   : 9
HTML document, ASCII text, with very lon  : 2
C source, UTF-8 Unicode text              : 1

PS。您的sort -n似乎不合适,file会返回文字,此时您不想以数字方式排序。

答案 1 :(得分:0)

将管道与Bash循环相结合,可以在零件上运行命令并将变量分配给零件。

假设:

$ echo "$out" 
21  ASCII text
19  C source, ASCII text
16  ASCII text
10  ASCII text, with very long lines
9   HTML document, UTF-8 Unicode text, with
2   HTML document, ASCII text, with very lon
1   C source, UTF-8 Unicode text

您可以使用sed分配分隔符(在本例中,我将使用:),然后阅读管道:

while IFS=: read -r fcnt ftype; do 
    printf "\t%-41s : %s\n" "$ftype" "$fcnt"
done < <(echo "$out" | sed -e 's/^\([ [:digit:]]*\)\(.*\)/\1:\2/')
    ASCII text                                : 21  
    C source, ASCII text                      : 19  
    ASCII text                                : 16  
    ASCII text, with very long lines          : 10  
    HTML document, UTF-8 Unicode text, with   : 9   
    HTML document, ASCII text, with very lon  : 2   
    C source, UTF-8 Unicode text              : 1   

然后用管道替换echo "$out"部分。

如果效率问题,您也可以使用相同类型的Bash正则表达式,而不是调用sed