Question

我想编写一个脚本，按内容查找重复的mp3，而不是bf文件名。我想知道如何为了比较而看到文件类型内部数据。谢谢。

Answer 1

cmp可用于比较二进制文件。

cmp file1.mp3 file2.mp3
if [[ $? -eq 0 ]]; then echo "Matched"; fi

如果文件相同或<{1}}，

cmp命令将返回0。

Answer 2

第一个命令行列出当前目录中具有相同大小和相同md5sum的所有文件

find . -type f -printf '%11s ' -exec md5sum '{}' ';' | 
  sort | uniq -w44 --all-repeated=separate

第二个命令行是

更快，因为它仅为具有相同尺寸的文件计算md5sum
更强大，因为它处理具有特殊字符的文件名为'space'或'newline'

因此它也更复杂

find . -type f -printf '%11s %P\0' | 
  LC_ALL=C sort -z | 
  uniq -Dzw11 | 
  while IFS= read -r -d '' line
  do
    md5sum "${line:12}"
  done | 
  uniq -w32 --all-repeated=separate | 
  tee duplicated.log

一些解释

# Print file size/md5sum/name in one line (size aligned in 11 characters)
find . -printf '%11s ' -exec md5sum '{}' ';'

# Print duplicated lines considering the the first 44 characters only
# 44 characters = size (11 characters) + one space + md5sum (32 characters)
uniq -w44 --all-repeated=separate

# Print size and path/filename terminated by a null character
find . -printf '%11s %P\0'

# Sort lines separated by a null character (-z) instead of a newline character
# based on native byte value (LC_ALL=C) instead of locals
LC_ALL=C sort -z  

# Read lines separated by null character
IFS= read -r -d '' line

# Skip the first 12 characters (size and space) 
# in order to obtain the rest: path/filename
"${line:12}"

Answer 3

如果文件实际上是字节到字节的等效，则可以开始搜索相同大小的文件。如果它们的大小相同，您可以进一步调查（例如，比较它们的md5sum）。如果文件只包含相同的歌曲，但使用不同的编解码器/压缩/其他，bash可能不是该任务的正确工具。

Answer 4

我将此脚本用于我的照片，但它可以用于其他文件。

首先我将照片从手机/相机传输到目录newfiles
然后我从我的图片根目录运行这个脚本
- 在检测到重复文件时，脚本会保留一个文件并将其他文件移动到目录../garbage
- 脚本在newfiles

警告：此脚本不会比较文件内容，但会检测具有相同大小和文件的文件。名称（这对于相机文件是可以的）。我的另一个答案是基于内容比较（md5sum）。

#!/bin/bash

# If a file from directory 'newfile' has same size & name
# that another file from another directory 
# then moves the file from 'newfile' to 'garbage'
find newfiles/ -type f -printf '%s %f\n' | 
while read SIZE f
do
   find . -name "$f" -size ${SIZE}c | 
     grep -v 'newfiles' && 
     find . -name "$f" -size ${SIZE}c -path '*newfiles*' -exec mv -v '{}' ../garbage ';' &&
     echo
done

# Detect all other duplicated files
# Keep the first occurrence and moves all other to 'garbage'
find . -type f -printf '%s %f\n' | 
  LC_ALL=C sort |  #LC_ALL=C disables locale => sort is faster
  uniq -dc      |  #keep duplicates and count number of occurrences 
  while read n SIZE f
  do
    echo -e "\n_____ $n files\t$SIZE bytes\tname: $f"
    find . -name "$f" -size ${SIZE}c |
       head -n 1 | 
       xargs mv -v -t ../garbage
  done

使用shell脚本不是通过名称来识别.mp3

4 个答案: