如何按bash中的文件扩展名和大小对目录进行排序

时间:2015-02-06 22:34:41

标签: linux bash file unix scripting

我想创建一个脚本,将所有文档打印在文件夹和子文件夹中的扩展名,计算该文件类型和大小的数量。

    Example: 
    file type | total count | total size
    pdf 30 4.0k
    txt 90 60.0k
这样的事情。除了尺寸部分,我已经能够弄清楚如何做到这一点。有什么建议?

2 个答案:

答案 0 :(得分:2)

find . -type f -print0 | xargs -0 du -k | grep "\.[a-zA-Z]*$" | rev | sed -e "s/\..*\t/\t/g" | rev | awk '{SUM[$2]=+$1} END{for (x in SUM) print x,SUM[x]}' | sort

<强>解释

find . -type f -print0

查找子目录中的所有文件,并用空字符(somefile.abc

分隔打印它们
| xargs -0 du -k

对于每个文件,它以千字节(12<TAB>somefile.abc

打印它的大小
| grep "\.[a-zA-Z0-9]*$"

仅选择以dot结尾的文件和一些扩展名(12<TAB>somefile.abc

| rev

以字符方式反转每一行(cba.elifemos<TAB>21

| sed -e "s/\..*\t/\t/g"

删除点与(cba<TAB>21

之间的每个字符
| rev

以字符方式反转每一行(12<TAB>abc

| awk '{SUM[$2]=+$1} END{for (x in SUM) print x,SUM[x]}'

根据扩展名汇总行

答案 1 :(得分:0)

使用GNU find和GNU awk

find . -type f -printf '%s %f\n' | awk '{ size = $1; ext = ""; if(sub(/.*\./, "") != 0) { ext = $0 }; total[ext] += size; ++ctr[ext]  } END { PROCINFO["sorted_in"] = "@ind_str_asc"; for(ext in total) { print ext " " ctr[ext] " " total[ext] } }'

下面

find . -type f -printf '%s %f\n'

打印每个文件的大小及其名称,不包含其路径的目录部分,awk代码的工作方式如下:

{                             # for each line in find's output
  size = $1                   # remember the size
  ext = ""                    # isolate the extension
  if(sub(/.*\./, "") != 0) {  # if the sub returns 0, there was no . in the
    ext = $0                  # file name, so it has no extension
  }
  total[ext] += size          # tally up the size and file counters
  ++ctr[ext]
}
END {                         # in the end: print the tallies.
                              # The PROCINFO bit for sorted output is GNU-
                              # specific. In case that's a worry, print
                              # unsorted and pipe through sort afterwards.
  PROCINFO["sorted_in"] = "@ind_str_asc"
  for(ext in total) {
    print ext " " ctr[ext] " " total[ext]
  }
}