Question

最终，我想摆脱重复条目显示我的阵列的可能性。我之所以这样做是因为我正在编写一个比较两个目录，搜索和删除重复文件的脚本。潜在的重复文件存储在一个数组中，只有当文件与原始文件具有相同的名称和校验和时，才会删除这些文件。因此，如果存在重复的条目，我最终会遇到小错误，其中md5要么尝试查找不存在的文件的校验和（因为它已经被删除），要么rm尝试删除已经删除的文件。

这是剧本的一部分。

compare()
{

read -p "Please enter two directories: " dir1 dir2

if [[ -d "$dir1" && -d "$dir2" ]]; then
    echo "Searching through $dir2 for duplicates of files in $dir1..."
else
    echo "Invalid entry. Please enter valid directories." >&2
    exit 1
fi

#create list of files in specified directory
while read -d $'\0' file; do
    test_arr+=("$file")
done < <(find $dir1 -print0)

#search for all duplicate files in the home directory
#by name
#find checksum of files in specified directory
tmpfile=$(mktemp -p $dir1 del_logXXXXX.txt)


for i in "${test_arr[@]}"; do
    Name=$(sed 's/[][?*]/\\&/g' <<< "$i")

    if [[ $(find $dir2 -name "${Name##*/}" ! -wholename "$Name") ]]; then
        [[ -f $i ]] || continue
        find $dir2 -name "${Name##*/}" ! -wholename "$Name" >> $tmpfile
        origray[$i]=$(md5sum "$i" | cut -c 1-32)
    fi
done

#create list of duplicate file locations.
dupe_loc

#compare similarly named files by checksum and delete duplicates
local count=0
for i in "${!indexray[@]}"; do
    poten=$(md5sum "${indexray[$i]}" | cut -c 1-32)
    for i in "${!origray[@]}"; do
        if [[ "$poten" = "${origray[$i]}" ]]; then
            echo "${indexray[$count]} is a duplicate of a file in $dir1."
            rm -v "${indexray[$count]}"
            break
        fi
    done
    count=$((count+1))
done
exit 0 
}

dupe_loc是以下功能。

dupe_loc()
{
if [[ -s $tmpfile ]]; then
    mapfile -t indexray < $tmpfile
else
    echo "No duplicates were found."
    exit 0
fi
}

我认为解决此问题的最佳方法是使用sort和uniq命令处理数组中的重复条目。但即使使用流程替换，我在尝试这样做时也会遇到错误。

Answer 1

首先要做的事情。 Bash数组排序已在此处得到解答：How to sort an array in BASH

那就是说，我不知道排序阵列会有很大帮助。似乎一个更简单的解决方案就是将你的md5 check和rm语句包装在if语句中：

if [ -f origarr[$i]} ]; do #True if file exists and is a regular file.
    #file exists
    ...
    rm ${origarr[$i]}
fi

如何排序数组的内容？

1 个答案: