awk,IFS和文件名截断

时间:2015-01-31 15:04:58

标签: bash awk

根据新信息更新了问题......

这是gist of my code,我一般认为我将项目存储在DropBox中:

〜/升降梭箱/公共/滴/ xx.xx.xx /不管

日期总是2个字符,2个字符和2个字符,点分开。在该文件夹中可以有更多文件夹和更多文件,这就是为什么当我使用find时,我没有设置depth并允许它以递归方式扫描。 https://gist.github.com/anonymous/ad51dc25290413239f6f

下面是一个缩短版本的要点,它不会按原样运行,我不相信,虽然假设你安装了DropBox并且路径位置有文件,但要点就会运行我成立了。

General workflow:
SIZE="+250k" # For `find` this is the value in size I am looking for files to be larger than
# Location where I store the output to `find` to process that file further later on.
TEMP="/tmp/drops-output.txt" 

Next I rm the tmp file and touch a new one.

I will then cd into
DEST=/Users/$USER/Dropbox/Public/drops

Perform a quick conditional check to make sure that I am working where I want to be, 
with all my values as variables, I could mess up easily and not be working where I 
thought I would be.
# Conditional check: is the current directory the one I want to be the working directory?
if [ "$(pwd)" = "${DEST}" ]; then
    echo -e "Destination and current working directory are equal, this is good!:\n    $(pwd)\n"
fi

The meat of step one is the `find` command
# Use `find` to locate a subset of files that are larger than a certain size
# save that to a temp file and process it.  I believe this could all be done in 
# one find command with -exec or similar but I can't figure it out
find . -type f -size "${SIZE}" -exec ls -lh {} \; >> "$TEMP"

Inside $TEMP will be a data set that looks like this:
-rw-r--r--@ 1 me  staff    61K Dec 28  2009 /Users/me/Dropbox/Public/drops/12.28.09/wor-10e619e1-120407.png
-rw-r--r--@ 1 me  staff   230K Dec 30  2009 /Users/me/Dropbox/Public/drops/12.30.09/hijack-loop-d6250496-153355.pdf
-rw-r--r--@ 1 me  staff    49K Dec 31  2009 /Users/me/Dropbox/Public/drops/12.31.09/mt-5a819185-180538.png

The trouble is, not all files will contains no spaces, though I have done all I can to make sure variables are quoted 
and wrapped in parens or braces or quotes where applicable.

With the results in /tmp I run:
# Number of results located as a result of the find `command` above
RESULTS=$(wc -l "$TEMP" | awk '{print $1}')
echo -e "Located: [$RESULTS] total files greater than or equal to $SIZE\n"

# With a result set found via `find`, now use awk to print out the sorted list of file 
# sizes and paths.
echo -e "SIZE    DATE      FILE PATH"
#awk '{print "["$5"]          ", $9, $10}' < "$TEMP" | sort -n
awk '{for(i=5;i<=NF;i++) {printf $i " "} ; printf "\n"}' "$TEMP" | sort -n

With the changes to awk from how I had it originally, my result now looks like this:
751K Oct 21 19:00 ./10.21.14/netflix-67-190039.png 
760K Sep 14 19:07 ./01.02.15/logos/RCA_old_logo.jpg 
797K Aug 21 03:25 ./08.21.14/girl-88-032514.zip 
916K Sep 11 21:47 ./09.11.14/small-shot-4d-214727.png

I want it to look like this:
SIZE    FILE PATH
========================================
751K    ./10.21.14/netflix-67-190039.png 
760K    ./01.02.15/logos/RCA_old_logo.jpg 
797K    ./08.21.14/girl-88-032514.zip 
916K    ./09.11.14/small-shot-4d-214727.png

# All Done
if [ "$?" -ne "0" ]; then
    echo "find of drop files larger than $SIZE completed without errors.\n"
    exit 1
fi
在获得一些新信息之前,

原始帖子到堆栈......

原帖在下面,根据新的信息,我尝试了一些新的策略,并留下了上面的脚本和信息。

我有一个简单的脚本,Mac OS X,它在目录上执行查找并找到所有类型为file且大小超过+ SIZE的文件

然后通过&gt;&gt;

将它们附加到文件中

从那里开始,我有一个基本上包含ls -la列表的文件,因此我使用awk通过此命令获取文件大小和文件名:

# With a result set found via `find`, now use awk to print out the sorted list of file 
# sizes and paths.
echo -e "SIZE          FILE PATH"
awk '{print "["$5"]          ", $9, $10}' < "$TEMP" | sort -n

所有工作都按照我的要求进行,但我在上面的代码中得到了一些文件名截断。整个文件大约有30行,我把它固定在这一行。我想如果我投入一个不同的内部字段9月会修复它。我可以使用\ t,因为它不能成为Mac OS X文件名中的\ t。

我认为这只是引用,但我似乎无法看到如果是这样的话。这是返回数据的示例,通常我得到大约50个结果。我填入此文件的第一个文件名截断:

[1.0M]           ./11.26.14/Bruna Legal
[1.4M]           ./12.22.14/card-88-082636.jpg 
[1.6M]           ./12.22.14/thrasher-8c-082637.jpg 
[11M]           ./01.20.15/td-6e-225516.mp3 

Bruna Legal是&#34; Bruna Legal Name.pdf&#34;在文件系统上。

1 个答案:

答案 0 :(得分:2)

您可以避免解析ls命令的输出,并使用find操作与printf完成整个工作,例如:

find /tmp -type f -maxdepth 1 -size +4k 2>/dev/null -printf "%kKB %f\n" |
  sort -nrk1,1

在我的示例中,它输出大于4千字节的每个文件。问题是find命令无法打印格式为MB的格式化输出。另外,数字排序对我来说不适用于数字周围的方括号,所以我省略了它们。在我的测试中它产生:

140KB +~JF7115171557203024470.tmp
140KB +~JF3757415404286641313.tmp
120KB +~JF8126196619419441256.tmp
120KB +~JF7746650828107924225.tmp
120KB +~JF7068968012809375252.tmp
120KB +~JF6524754220513582381.tmp
120KB +~JF5532731202854554147.tmp
120KB +~JF4394954996081723171.tmp
24KB +~JF8516467789156825793.tmp
24KB +~JF3941252532304626610.tmp
24KB +~JF2329724875703278852.tmp
16KB 578829321_2015-01-23_1708257780.pdf
12KB 575998801_2015-01-16_1708257780-1.pdf
8KB adb.log

编辑,因为我注意到%k不够准确,因此您可以使用%s以字节打印并使用{{1转换为KB o MB喜欢:

awk

它产生:

find /tmp -type f -maxdepth 1 -size +4k 2>/dev/null -printf "%sKB %f\n" | 
  sort -nrk1,1 | 
  awk '{ $1 = sprintf( "%.2f", $1 / 1024) } { print }'