假设我有两个名为 dir_one 和 dir_two 的目录。在每个目录中,我都有一个名为 data.txt 的文本文件。换句话说,在两个单独的目录中有两个文件: /dir_one/data.txt 和 /dir_one/data.txt 尽管文件名相似,但两个文本文件可能有也可能没有相同的内容!
我想做的是:
我已在命令终端输入以下内容:
diff -qrs ./dir_one/data.txt ./dir_two/data.txt
我收到以下消息:
Files ./dir_one/data.txt ./dir_two/data.txt are identical.
现在我知道两个文本文件是相同的,我可以使用rm
命令删除其中一个文件。到现在为止还挺好。然而...
问题是我想自动删除过程。我不想在命令行输入rm
。有没有办法做到这一点 - 例如在脚本中?
我还想知道如何将一个目录中的大量文本文件与另一个目录中的大量文本文件进行比较。同样,对于发现相同的任何文件,应删除其中一个副本。这也可能吗?
我发现了类似的问题,但没有关于自动删除其中一个重复文件的问题。请注意,我使用的是ubuntu 12.04。
答案 0 :(得分:4)
你需要fdupes。
fdupes -r /some/directory/path > /some/directory/path/fdupes.log
享受!
答案 1 :(得分:1)
diff
如果文件相同则返回退出状态0,如果它们不同则返回1,如果有错误则返回2。您可以使用它来决定执行rm命令
diff file1 file2 && rm file2
答案 2 :(得分:0)
这是我最近刚写过的一个剧本,并且最近已经完成了。您应该从要进行重复数据删除的目录中运行它。它会将所有重复项放在" clean"之外的目录中。目录:
#!/bin/bash
# this script walks through all files in the current directory,
# checks if there are duplicates (it compares only files with
# the same size) and moves duplicates to $duplicates_dir.
#
# options:
# -H remove hidden files (and files in hidden folders)
# -n dry-run: show duplicates, but don't remove them
# -z deduplicate empty files as well
while getopts "Hnz" opts; do
case $opts in
H)
remove_hidden="yes";;
n)
dry_run="yes";;
z)
remove_empty="yes";;
esac
done
# support filenames with spaces:
IFS=$(echo -en "\n\b")
working_dir="$PWD"
working_dir_name=$(echo $working_dir | sed 's|.*/||')
# prepare some temp directories:
filelist_dir="$working_dir/../$working_dir_name-filelist/"
duplicates_dir="$working_dir/../$working_dir_name-duplicates/"
if [[ -d $filelist_dir || -d $duplicates_dir ]]; then
echo "ERROR! Directories:"
echo " $filelist_dir"
echo "and/or"
echo " $duplicates_dir"
echo "already exist! Aborting."
exit 1
fi
mkdir $filelist_dir
mkdir $duplicates_dir
# get information about files:
find -type f -print0 | xargs -0 stat -c "%s %n" | \
sort -nr > $filelist_dir/filelist.txt
if [[ "$remove_hidden" != "yes" ]]; then
grep -v "/\." $filelist_dir/filelist.txt > $filelist_dir/no-hidden.txt
mv $filelist_dir/no-hidden.txt $filelist_dir/filelist.txt
fi
echo "$(cat $filelist_dir/filelist.txt | wc -l)" \
"files to compare in directory $working_dir"
echo "Creating file list..."
# divide the list of files into sublists with files of the same size
while read string; do
number=$(echo $string | sed 's/\..*$//' | sed 's/ //')
filename=$(echo $string | sed 's/.[^.]*\./\./')
echo $filename >> $filelist_dir/size-$number.txt
done < "$filelist_dir/filelist.txt"
# plough through the files
for filesize in $(find $filelist_dir -type f | grep "size-"); do
if [[ -z $remove_empty && $filesize == *"size-0.txt" ]]; then
continue
fi
filecount=$(cat $filesize | wc -l)
# there are more than 1 file of particular size ->
# these may be duplicates
if [ $filecount -gt 1 ]; then
if [ $filecount -gt 200 ]; then
echo ""
echo "Warning: more than 200 files with filesize" \
$(echo $filesize | sed 's|.*/||' | \
sed 's/size-//' | sed 's/\.txt//') \
"bytes."
echo "Since every file needs to be compared with"
echo "every other file, this may take a long time."
fi
for fileA in $(cat $filesize); do
if [ -f "$fileA" ]; then
for fileB in $(cat $filesize); do
if [ -f "$fileB" ] && [ "$fileB" != "$fileA" ]; then
# diff will exit with 0 iff files are the same.
diff -q "$fileA" "$fileB" 2> /dev/null > /dev/null
if [[ $? == 0 ]]; then
# detect if one filename is a substring of another
# so that in case of foo.txt and foo(copy).txt
# the script will remove foo(copy).txt
# supports filenames with no extension.
fileA_name=$(echo $fileA | sed 's|.*/||')
fileB_name=$(echo $fileB | sed 's|.*/||')
fileA_ext=$(echo $fileA_name | sed 's/.[^.]*//' | sed 's/.*\./\./')
fileB_ext=$(echo $fileB_name | sed 's/.[^.]*//' | sed 's/.*\./\./')
fileA_name="${fileA_name%%$fileA_ext}"
fileB_name="${fileB_name%%$fileB_ext}"
if [[ $fileB_name == *$fileA_name* ]]; then
echo " $(echo $fileB | sed 's|\./||')" \
"is a duplicate of" \
"$(echo $fileA | sed 's|\./||')"
if [ "$dry_run" != "yes" ]; then
mv --backup=t "$fileB" $duplicates_dir
fi
else
echo " $(echo $fileA | sed 's|\./||')" \
"is a duplicate of" \
"$(echo $fileB | sed 's|\./||')"
if [ "$dry_run" != "yes" ]; then
mv --backup=t "$fileA" $duplicates_dir
fi
fi
fi
fi
done
fi
done
fi
done
rm -r $filelist_dir
if [ "$dry_run" != "yes" ]; then
echo "Duplicates moved to $duplicates_dir."
fi