使用Linux中的最新版本查找重复文件

时间:2017-11-17 07:33:57

标签: bash shell duplicates

我尝试了以下代码,并且我无法实现我想要的。

#!/usr/bin/bash
find ./ -type f \( -iname "*.xml" \) | sort -n > fileList
sed -i '/\.\/fileList/d' fileList
NAMEOFTHISFILE=$(echo $0|sed -e 's/[]\/()$*.^|[]/\\&/g')
sed -i "/$NAMEOFTHISFILE/d" fileList
cp fileList auxFileList
while read FILENAME
do
    sed -i '1d' auxFileList
    #echo "Comparing $FILENAME with :"
    #Read the aux file and compare current file with every other element in the file
    while read COMPFILENAME
    do
        RETURN=$(diff $FILENAME $COMPFILENAME)
        if [ "$RETURN" == "" ]
        then
        cat $FILENAME | awk ' BEGIN { FS="_" } { printf( "%03d\n",$2) }' | sort | awk ' { printf( "data_%d_box\n", $1)  }'
         #echo "$FILENAME AND $COMPFILENAME are identical"
         #rm -r $FILENAME
        fi
        #echo "  $COMPFILENAME"
    done<auxFileList
done<fileList
rm fileList auxFileList &>/dev/null
printf '\n\n'

此代码最初选择所有文件。我必须修改我的代码,只有最近修改过的文件名模式,例如

File 1: AAA_555_0000 
File 2: AAAA_123_123 
File 3: AAAA_452_452 [latest]

File 4: BBB_555_0000 
File 5: BBB_555_555 
File 6: BBB_999_999 [latest]

File 7: CCC_555_0000 
File 8: CCC_000_000 
File 9: CCC_000_111 [latest]

脚本必须选择文件夹中所有文件名模式的最新文件,并且应该比较并删除重复项。

感谢你能否帮我解决这个问题。

非常感谢!

0 个答案:

没有答案