我想创建一个脚本,将所有文档打印在文件夹和子文件夹中的扩展名,计算该文件类型和大小的数量。
Example:
file type | total count | total size
pdf 30 4.0k
txt 90 60.0k
这样的事情。除了尺寸部分,我已经能够弄清楚如何做到这一点。有什么建议?
答案 0 :(得分:2)
find . -type f -print0 | xargs -0 du -k | grep "\.[a-zA-Z]*$" | rev | sed -e "s/\..*\t/\t/g" | rev | awk '{SUM[$2]=+$1} END{for (x in SUM) print x,SUM[x]}' | sort
<强>解释强>
find . -type f -print0
查找子目录中的所有文件,并用空字符(somefile.abc
)
| xargs -0 du -k
对于每个文件,它以千字节(12<TAB>somefile.abc
)
| grep "\.[a-zA-Z0-9]*$"
仅选择以dot结尾的文件和一些扩展名(12<TAB>somefile.abc
)
| rev
以字符方式反转每一行(cba.elifemos<TAB>21
)
| sed -e "s/\..*\t/\t/g"
删除点与(cba<TAB>21
)
| rev
以字符方式反转每一行(12<TAB>abc
)
| awk '{SUM[$2]=+$1} END{for (x in SUM) print x,SUM[x]}'
根据扩展名汇总行
答案 1 :(得分:0)
使用GNU find
和GNU awk
:
find . -type f -printf '%s %f\n' | awk '{ size = $1; ext = ""; if(sub(/.*\./, "") != 0) { ext = $0 }; total[ext] += size; ++ctr[ext] } END { PROCINFO["sorted_in"] = "@ind_str_asc"; for(ext in total) { print ext " " ctr[ext] " " total[ext] } }'
下面
find . -type f -printf '%s %f\n'
打印每个文件的大小及其名称,不包含其路径的目录部分,awk代码的工作方式如下:
{ # for each line in find's output
size = $1 # remember the size
ext = "" # isolate the extension
if(sub(/.*\./, "") != 0) { # if the sub returns 0, there was no . in the
ext = $0 # file name, so it has no extension
}
total[ext] += size # tally up the size and file counters
++ctr[ext]
}
END { # in the end: print the tallies.
# The PROCINFO bit for sorted output is GNU-
# specific. In case that's a worry, print
# unsorted and pipe through sort afterwards.
PROCINFO["sorted_in"] = "@ind_str_asc"
for(ext in total) {
print ext " " ctr[ext] " " total[ext]
}
}