Question

我有一个像这样的目录结构

ARCHIVE_LOC -> epoch1 -> a.txt
                         b.txt

            -> epoch2 -> b.txt
                         c.txt

            -> epoch3 -> b.txt
                         c.txt

我有一个基本档案目录。该目录通过rsync（定期）从android应用程序获取日志，这些日志基于rsync进程的纪元/时间戳保存在目录中。我想删除所有重复的日志文件（它们具有相同的名称）并保留最新的日志文件。有关如何实现这一目标的任何帮助？

简而言之，我只想保留每个文件的最新文件。知道哪个文件最新的一种方法是文件大小，因为新文件的大小总是大于或等于旧文件。

Answer 1

在Debian 7上，我设法提出以下单行：

find path/to/folder -type f -name *.txt -printf '%Ts\t%p\n' | sort -nr | cut -f2 | perl -ne '/(\w+.txt)/; print if $seen{$&}++' | xargs rm

这很长，也许有更短的方法，但似乎可以做到这一点。我在这里结合了发现

https://superuser.com/questions/608887/how-can-i-make-find-find-files-in-reverse-chronological-order

在这里

Perl regular expression removing duplicate consecutive substrings in a string

Answer 2

写了下面的剧本，对我来说很好。

Title

Answer 3

#!/bin/bash
declare -A arr
shopt -s globstar

for file in **; do
    [[ -f "$file" ]] || continue
    read cksm _ < <(md5sum "$file")
    if ((arr[$cksm]++)); then 
    echo "rm $file"
    fi
done

[https://superuser.com/questions/386199/how-to-remove-duplicated-files-in-a-directory][1]

在目录中删除文件名中的重复文件（linux）

3 个答案: