Question

我正在尝试编写一个列出目录中文件的脚本，然后逐个搜索其他目录中的每个文件。对于处理空格和特殊字符，如“[”或“]”我使用$（printf％q“$ FILENAME”）作为find命令的输入：find / directory / to / search -type f $（printf％ q“$ FILENAME”）。除了一种情况外，它就像每个文件名的魅力一样：当有多字节字符（UTF-8）时。在这种情况下，printf的输出是一个外部带引号的字符串，即：$'文件名，带有空格和带引号的字符，形式为\ NNN \ NNN'，如果没有$''引用，该字符串不会被展开，所以找到一个名称包含该引号的文件：«$'filename'»。

是否有替代解决方案才能传递以查找任何类型的文件名？

我的脚本如下（我知道有些行可以删除，比如“RESNAME =”）：

#!/bin/bash

if [ -d $1 ] && [ -d $2 ]; then
    IFSS=$IFS
    IFS=$'\n'
    FILES=$(find $1 -type f )
    for FILE in $FILES; do
        BASEFILE=$(printf '%q' "$(basename "$FILE")")
        RES=$(find $2 -type f -name "$BASEFILE" -print )
        if [ ${#RES} -gt 1 ]; then
            RESNAME=$(printf '%q' "$(basename "$RES")")
        else
            RESNAME=
        fi
        if [ "$RESNAME" != "$BASEFILE" ]; then
            echo "FILE NOT FOUND: $FILE"
        fi
    done

else
    echo "Directories do not exist"
fi

IFS=$IFSS

正如一个答案所说，我使用了关联数组，但没有运气，也许我没有正确使用数组，但回显它（数组[@]）什么都不返回。这是我写的脚本：

#!/bin/bash
if [ -d "$1" ] && [ -d "$2" ]; then
    declare -A files
    find "$2" -type f -print0 | while read -r -d $'\0' FILE;
    do
        BN2="$(basename "$FILE")"
        files["$BN2"]="$BN2"
    done

    echo "${files[@]}"

    find "$1" -type f -print0 | while read -r -d $'\0' FILE;
    do
        BN1="$(basename "$FILE")"
        if [ "${files["$BN1"]}" != "$BN1" ]; then
            echo "File not found: "$BN1""  
        fi
    done
fi

Answer 1

尝试这样的事情：

find "$DIR1" -printf "%f\0" | xargs -0 -i find "$DIR2" -name \{\}

Answer 2

不要使用for循环。首先，它更慢。您的find必须在程序的其余部分运行之前完成。其次，可以使命令行过载。 enter for命令必须适合命令行缓冲区。

最重要的是，for很难处理时髦的文件名。你正在试图解决这个问题。但是：

find $1 -type f -print0 | while read -r -d $'\0' FILE

会更好。它处理文件名 - 甚至包含\n个字符的文件名。 -print0告诉find将文件名与NUL字符分开。 while read -r -d $'\0 FILE会将每个文件名（由NUL字符分开）读入$FILE。

如果在find命令中放置文件名的引号，则不必担心文件名中的特殊字符。

对于找到的每个文件，您的脚本运行find一次。如果您的第一个目录中有100个文件，则表示您正在运行find 100次。

你知道BASH中的关联（哈希）数组吗？你可能最好使用关联数组。在第一个目录上运行find，并将这些文件名存储在关联数组中。

然后，为您的第二个目录运行find（再次使用find | while read语法）。对于在第二个目录中找到的每个文件，请查看关联数组中是否有匹配的条目。如果这样做，您就知道该文件位于两个阵列中。

附录

我一直在查看find命令。似乎没有真正的方法可以阻止它使用模式匹配，除非通过大量的工作（就像你使用printf一样。）我尝试使用-regex匹配并使用\Q和\E删除模式字符的特殊含义。我没有成功。

有一段时间你需要比shell更强大和灵活的东西来实现你的脚本，我相信现在是时候了。

Perl，Python和Ruby是几乎在所有Unix系统上都可以找到的三种无处不在的脚本语言，并且可以在其他非POSIX 平台上使用（咳嗽！... Windows！...咳！）。

下面是一个Perl脚本，它接受两个目录，并搜索它们以查找匹配的文件。它使用find命令一次并使用关联数组（在Perl中称为哈希）。我将哈希键入我的文件名。在 hash 的 value 部分中，我存储了找到此文件的目录数组。

我只需要为每个目录运行一次find命令。完成后，我可以打印出包含多个目录的散列中的所有条目。

我知道这不是shell，但这是你可以花费更多时间来弄清楚如何让shell做你想做的事而不是它的价值的情况之一。

#! /usr/bin/env perl

use strict;
use warnings;
use feature qw(say);

use File::Find;
use constant DIRECTORIES => qw( dir1 dir2 );


my %files;
#
# Perl version of the find command. You give it a list of
# directories and a subroutine for filtering what you find.
# I am basically rejecting all non-file entires, then pushing
# them into my %files hash as an array.
#
find (
    sub {
        return unless -f;
        $files{$_} = [] if not exists $files{$_};
        push @{ $files{$_} }, $File::Find::dir;
    },  DIRECTORIES
);

#
# All files are found and in %files hash. I can then go
# through all the entries in my hash, and look for ones
# with more than one directory in the array reference.
# IF there is more than one, the file is located in multiple
# directories, and I print them.
#

for my $file ( sort keys %files ) {
    if ( @{ $files{$file} } > 1 ) { 
        say  "File: $file: " . join ", ", @{ $files{$file} };
    }
}

Answer 3

这个单线程怎么样？

find dir1 -type f -exec bash -c 'read < <(find dir2 -name "${1##*/}" -type f)' _ {} \; -printf "File %f is in dir2\n" -o -printf "File %f is not in dir2\n"

对于名称中带有滑稽符号，换行符和空格的文件，绝对100％安全。

它是如何运作的？

find（主要的）将扫描目录dir1，并且每个文件（-type f）将执行

read < <(find dir2 -name "${1##*/} -type f")

with argument是主find给出的当前文件的名称。该论点位于$1位置。 ${1##*/}删除了上一个/之前的所有内容，以便$1为path/to/found/file时find语句为：

find dir2 -name "file" -type f

如果找到文件，则输出内容，否则无输出。这是read bash命令读取的内容。 read的退出状态如果能够读取则为真，如果没有读取则为假（即，如果没有找到）。此退出状态变为bash的退出状态，该状态变为-exec的状态。如果为true，则执行下一个-printf语句，如果为false，则执行-o -printf部分。

如果您的目录是变量$dir1和$dir2，请执行此操作，以便对$dir2中可能出现的空格和有趣符号安全：

find "$dir1" -type f -exec bash -c 'read < <(find "$0" -name "${1##*/}" -type f)' "$dir2" {} \; -printf "File %f is in $dir2\n" -o -printf "File %f is not in $dir2\n"

关于效率：这当然不是一种有效的方法！内部find将执行dir1中找到的文件的次数。这很糟糕，特别是如果dir2下的目录树很深并且有很多分支（你可以稍微依赖缓存，但是有限制！）。

关于可用性：您可以对find的工作方式和输出方式进行细粒度控制，并且可以非常轻松地添加更多测试。

那么，嘿，告诉我如何比较两个目录中的文件？好吧，如果你同意失去一点控制权，这将是最短和最有效的答案：

diff dir1 dir2

尝试一下，你会感到惊讶！

Answer 4

如果你想使用关联数组，这里有一种可能适用于名字中带有各种有趣符号的文件（这个脚本有太多东西只能显示点，但它可以原样使用 - 只需删除您不想要的部分并适应您的需求）：

#!/bin/bash

die() {
    printf "%s\n" "$@"
    exit 1
}

[[ -n $1 ]] || die "Must give two arguments (none found)"
[[ -n $2 ]] || die "Must give two arguments (only one given)"

dir1=$1
dir2=$2

[[ -d $dir1 ]] || die "$dir1 is not a directory"
[[ -d $dir2 ]] || die "$dir2 is not a directory"

declare -A dir1files
declare -A dir2files

while IFS=$'\0' read -r -d '' file; do
   dir1files[${file##*/}]=1
done < <(find "$dir1" -type f -print0)

while IFS=$'\0' read -r -d '' file; do
   dir2files[${file##*/}]=1
done < <(find "$dir2" -type f -print0)

# Which files in dir1 are in dir2?
for i in "${!dir1files[@]}"; do
   if [[ -n ${dir2files[$i]} ]]; then
      printf "File %s is both in %s and in %s\n" "$i" "$dir1" "$dir2"
      # Remove it from dir2 has
      unset dir2files["$i"]
   else
      printf "File %s is in %s but not in %s\n" "$i" "$dir1" "$dir2"
   fi
done

# Which files in dir2 are not in dir1?
# Since I unset them from dir2files hash table, the only keys remaining
# correspond to files in dir2 but not in dir1

if [[ -n "${!dir2files[@]}" ]]; then
   printf "File %s is in %s but not in %s\n" "$dir2" "$dir1" "${!dir2files[@]}"
fi

备注。文件的识别仅基于文件名而非内容。

Answer 5

由于您仅使用find作为其递归目录，因此只需使用globstar中的bash选项即可。（您正在使用关联数组，因此bash足够新）。

#!/bin/bash
shopt -s globstar
declare -A files
if [[ -d $1 && -d $2 ]]; then
    for f in "$2"/**/*; do
        [[ -f "$f" ]] || continue
        BN2=$(basename "$f")
        files["$BN2"]=$BN2
    done

    echo "${files[@]}"

    for f in "$1"/**/*; do
        [[ -f "$f" ]] || continue
        BN1=$(basename $f)
        if [[ ${files[$BN1]} != $BN1 ]]; then
            echo "File not found: $BN1"
        fi
    done
fi

**将匹配零个或多个目录，因此$1/**/*将匹配$1中的所有文件和目录，那些目录中的所有文件和目录等等，一直到树下。

查找命令，文件名来自bash printf builtin not working

5 个答案:

附录