Question

使用Kubator命令行回答我的问题：

 #Function that shows the files having the same content in the current directory
showDuplicates (){
  last_file=''
  while read -r f1_hash f1_name; do
    if [ "$last_file" != "$f1_hash" ]; then
      echo "The following files have the exact same content :"
      echo "$f1_name"
      while read -r f2_hash f2_name; do
        if [ "$f1_hash" == "$f2_hash" ] && [ "$f1_name" != "$f2_name" ]; then
          echo "$f2_name"
        fi
      done < <(find ./ -maxdepth 1 -type f -print0 | xargs -0 md5sum | sort -k1,32 | uniq -w32 -D)
    fi
    last_file="$f1_hash"
  done < <(find ./ -maxdepth 1 -type f -print0 | xargs -0 md5sum | sort -k1,32 | uniq -w32 -D)
}

原始问题：

我已经看到了一些有关我要问的问题的讨论，但是我很难理解所提出的解决方案背后的机制，因此我无法解决接下来的问题。

我想创建一个比较文件的功能，为此，我天真的尝试了以下操作：

#somewhere I use that to get the files paths
files_to_compare=$(find $base_path -maxdepth 1 -type f)
files_to_compare=( $files_to_compare )

#then I pass files_to_compare as an argument to the following function
showDuplicates (){
  files_to_compare=${1}
  n_files=$(( ${#files_to_compare[@]} ))
  for (( i=0; i < $n_files ; i=i+1 )); do
     for (( j=i+1; j < $n_files ; j=j+1 )); do
         sameContent "${files_to_compare[i]}" "${files_to_compare[j]}"
         r=$?
         if [ $r -eq 1 ]; then
            echo "The following files have the same content :"
            echo ${files_to_compare[i]}
            echo ${files_to_compare[j]}
         fi
    done
  done
}

函数“ sameContent”采用两个文件的绝对路径，并使用不同的命令（du，wc，diff）来返回1或0，这取决于具有相同内容的文件（分别）。

该代码的不正确性已经出现，文件名中包含空格，但是我从那以后读到，这不是在bash中操作文件的方法。

在https://unix.stackexchange.com/questions/392393/bash-moving-files-with-spaces和其他一些页面上，我读到正确的方法是使用看起来像这样的代码：

$ while IFS= read -r file; do echo "$file"; done < files

我似乎无法理解那部分代码的背后以及如何使用它来解决我的问题。特别是由于我想要/需要使用复杂的循环。

我是bash的新手，这似乎是一个普遍的问题，但是如果有人足够友好地向我提供一些有关如何工作的见解，那将是很棒的事情。

p.s .：请原谅可能的语法错误

Answer 1

如何使用md5sum来比较文件夹中文件的内容。那是更安全和标准的方式。然后，您只需要这样的东西：

find ./ -type f -print0 | xargs -0 md5sum | sort -k1,32 | uniq -w32 -D

它的作用：

find在当前文件夹-type f中查找所有文件./并输出以空字节-print0分隔，这是文件名中特殊字符（如文件名中的空格）所必需的（例如，您提到移动带有空格的文件）
xargs从以空字节-0分隔的find中获取输出，并对文件执行md5sum哈希值
sort按位置1-32（即md5哈希）对输出进行排序-k1,32
uniq通过前32个字符（md5哈希）-w32使输出唯一，并仅过滤重复行-D

输出示例：

7a2e203cec88aeffc6be497af9f4891f  ./file1.txt
7a2e203cec88aeffc6be497af9f4891f  ./folder1/copy_of_file1.txt
e97130900329ccfb32516c0e176a32d5  ./test.log
e97130900329ccfb32516c0e176a32d5  ./test_copy.log

如果性能至关重要，则可以调整为首先按文件大小排序，然后再比较md5sums。或称为mv，rm等。

查找内容相同的文件

1 个答案: