Question

我有很多包含数字列表的文本文件，这些文件包含在很多文件夹中。大多数文件/列表是相同的，我正在寻找一种方法来找到那些不是。

列表应该包含这些数字：

我正在寻找一种打印到文本文件的方法，文件名和文件路径不完全相同。

我尝试使用awk，sed和其他shell工具，但由于我对此非常陌生，我失败了。我希望得到一些解释的例子。

谢谢！

Answer 1

如果文件应该是完全重复的，你知道两件事：它们的大小应该是29个字节，而md5sum应该是00c7dd845c7e87a1d1751566bd23ad61 - 因为

seq 0 50 350 | wc -c
seq 0 50 350 | md5sum

因此，只需搜索不同大小或不同md5sum的文件：

find . -not -size 29c
find . -size 29c -exec md5sum {} + \
    | grep -v ^00c7dd845c7e87a1d1751566bd23ad61 \
    | cut -f2 -d\*

Answer 2

RefFile=./ThisFile

find . -type f -exec cksum {} \; \
 | awk -v "Ref=$( cksum ${RefFile} )" '
     BEGIN { split( Ref, aRef); crc=aRef[1] }
     $1 != crc { print $3}
     '

将返回与引用文件

不同的文件

Answer 3

awk '
  # NR is numbers of rows read global
  # FNR is numbers of rows read for each file 
  # NR==FNR is only true for the first file
  # store the lines from the first file in an array 
  # use next to skip the next condition
  NR==FNR { a[NR]=$0; next }
  # this part is only ran when NR!=FNR
  # check to see if the array value is equal the row we currently read:
  a[FNR]!=$0 { print FILENAME }
' checkfile fileA fileB fileC ...

但是，每次与checkfile不同时，都会打印文件名，但这可以通过以下方式解决：| uniq：

awk '
  NR==FNR { a[NR]=$0; next }
  a[FNR]!=$0 { print FILENAME }
' checkfile fileA fileB fileC | uniq

如果换行无关紧要，可以使用$0作为数组的键：

awk '
  NR==FNR { a[$0]=1; next }
  !($0 in a) { print FILENAME }
' checkfile fileA fileB fileC

如果文件名与模式

3 个答案: