比较文件名并重新创建源目录后移动文件

时间:2018-12-05 21:47:21

标签: bash awk find printf posix

我正在学习shell脚本,并努力保持与POSIX兼容,同时保持代码库的可读性。目标是从目录A中读取文件列表,从目录B中找到它们的匹配项,并在目录C中重新创建目录父B的一部分,应将目录A中的文件移动到该目录中,然后从中删除匹配/移动的文件。目录B,如果找到的目录B文件中的目录为空,则将其删除。目录A中的所有文件将始终彼此唯一,并且目录B中始终存在一个或多个匹配项,目录C中永远不存在匹配项,但是目录C中的子目录可能已经存在以与目录B进行匹配。将匹配项从目录A移到目录C后,应删除目录B中所有匹配的文件。扩展名会随着文件的单独处理而发生变化,但文件名将完全匹配。文件名可能包含空格和句点。文件名的长度将不总是相同。输出目录和归档目录中有两个子目录级别。

这是我到目前为止所掌握的。我被困在编写for循环来完成肮脏的工作。尽量不要超出find,printf,awk,grep,for和if的范围。

#!/bin/sh
execHome="intendedMachine"
baseDir="/home/library/projects"
folderNew="output"
folderOld="working"
folderArchive="archive"
workingTypes=("jpg", "svg", "bmp", "tiff", "psd")

$folderNew="$baseDir/$folderNew"
$folderOld="$baseDir/$folderOld"
folderArchive="$baseDir/$folderArchive"

if [ "$(uname -n)" = "$execHome" ]
then

  count=$(find $folderNew -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'|wc -l)

  printf "\nFound/processing %s files in the %s folder\n\n" "$count" "$folderNew"

  find $folderNew -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'

else
  printf "Executed from %s; Run from %s for proper execution.\n" "$(uname -n)" "$execHome"
fi

示例:

目录A

/home/library/projects/output/projectOne 1.a.png
/home/library/projects/output/projectOne 1.b.png
/home/library/projects/output/projectOne 1.c.png
/home/library/projects/output/projectThree 3.m.png
/home/library/projects/output/projectThree 3.o.png
/home/library/projects/output/projectFour 4.t.png
/home/library/projects/output/projectFour 4.u.png

目录B

/home/library/projects/working/House/2018 01/projectOne 1.a.jpg
/home/library/projects/working/House/2018 01/projectOne 1.a.svg
/home/library/projects/working/House/2018 01/projectOne 1.b.jpg
/home/library/projects/working/House/2018 01/projectOne 1.b.svg
/home/library/projects/working/House/2018 01/projectOne 1.c.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.g.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.g.svg
/home/library/projects/working/House/2018 02/projectTwo 2.h.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.h.svg
/home/library/projects/working/House/2018 02/projectTwo 2.i.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.m.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.n.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.o.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.o.svg
/home/library/projects/working/Car/2018 04/projectFour 4.s.jpg
/home/library/projects/working/Car/2018 04/projectFour 4.t.jpg
/home/library/projects/working/Car/2018 04/projectFour 4.u.jpg

目录C

/home/library/projects/archive/House/2018 01/projectOne 1.d.png
/home/library/projects/archive/House/2018 01/projectOne 1.e.png
/home/library/projects/archive/House/2018 01/projectOne 1.f.png
/home/library/projects/archive/Car/2018 03/projectThree 3.p.png
/home/library/projects/archive/Car/2018 03/projectThree 3.q.png
/home/library/projects/archive/Car/2018 03/projectThree 3.r.png

所需结果:

目录A文件已移至目录C

/home/library/projects/output/

目录B应删除目录A文件,并删除空文件夹

/home/library/projects/working/House/2018 02/projectTwo 2.g.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.g.svg
/home/library/projects/working/House/2018 02/projectTwo 2.h.jpg
/home/library/projects/working/House/2018 02/projectTwo 2.h.svg
/home/library/projects/working/House/2018 02/projectTwo 2.i.jpg
/home/library/projects/working/Car/2018 03/projectThree 3.n.jpg
/home/library/projects/working/Car/2018 04/projectFour 4.s.jpg

目录C应同时包含旧档案和新输出文件作为档案

/home/library/projects/archive/House/2018 01/projectOne 1.a.png
/home/library/projects/archive/House/2018 01/projectOne 1.b.png
/home/library/projects/archive/House/2018 01/projectOne 1.c.png
/home/library/projects/archive/House/2018 01/projectOne 1.d.png
/home/library/projects/archive/House/2018 01/projectOne 1.e.png
/home/library/projects/archive/House/2018 01/projectOne 1.f.png
/home/library/projects/archive/Car/2018 03/projectThree 3.m.png
/home/library/projects/archive/Car/2018 03/projectThree 3.o.png
/home/library/projects/archive/Car/2018 03/projectThree 3.p.png
/home/library/projects/archive/Car/2018 03/projectThree 3.q.png
/home/library/projects/archive/Car/2018 03/projectThree 3.r.png
/home/library/projects/archive/Car/2018 04/projectFour 4.t.png
/home/library/projects/archive/Car/2018 04/projectFour 4.u.png

无论如何,还是从bash 4.4.19机器上运行该代码以查看其工作方式,但是它并没有达到我的预期。这是结果输出:

Found/processing 4 files in the /home/library/projects/output folder

./auto-archive.sh: line 34: hash["$proj"]: bad array subscript
parent of /home/library/projects/output/.temp/projectThree 3.m.png not found
parent of /home/library/projects/output/projectOne 1.a.png not found
parent of /home/library/projects/output/.temp/projectThree 3.0.png not found
parent of /home/library/projects/output/projectFour 4.t.png not found

抱歉。我之前也没有提到目录B不应进行递归扫描,在用例中会生成其他正在写入的临时文件,但可能尚未准备好进行移动。同样,出于测试目的,目录A中实际上只有上面列出的四个文件;并非最初列出的所有文件。此外,在重新创建建议的测试结构之后,您的代码似乎可以完美执行。与实际文件结构的结果不符。我担心我可能在描述实际文件结构/命名约定时错过了一些关键要素。现在查看描述符差异。很抱歉,我们花了点时间,但是您的准确性肯定给您留下深刻的印象。感觉我们接近了,但是绝对需要在较早版本的bash上运行。

1 个答案:

答案 0 :(得分:0)

任务将分为三个步骤:

  1. 要创建一个映射,以将每个文件名(项目名称)与其在C中的父目录名称相关联。这是在准备阶段通过分析B中的路径名来执行的。我们将使用关联数组和 bash版本必须为4.2或更高版本

  2. 要遍历A中的文件,请使用在第一步中创建的映射来构成要存储在C中的路径名,然后删除B中的文件。

  3. 作为清理阶段,我们删除了B中的空目录(如果有)。

那又如何:

#!/bin/bash

execHome="intendedMachine"
baseDir="/home/library/projects"
folderNew="output"
folderOld="working"
folderArchive="archive"
workingTypes=("jpg" "svg" "bmp" "tiff" "psd")
declare -A hash

folderNew="$baseDir/$folderNew"
folderOld="$baseDir/$folderOld"
folderArchive="$baseDir/$folderArchive"

if [ "$(uname -n)" != "$execHome" ]; then
    printf "Executed from %s; Run from %s for proper execution.\n" "$(uname -n)" "$execHome"
    exit
fi

count=$(find "$folderNew" -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'|wc -l)
printf "\nFound/processing %s files in the %s folder\n\n" "$count" "$folderNew"

# determine parent directory name for each project name and create a map for them
while IFS=  read -r -d $'\0' f; do 
    proj="${f##*/}"         # remove dirname
    proj="${proj%.*}"               # remove extention
    parent="${f##*$baseDir/}"       # remove pathname until $baseDir
    parent="${parent#*/}"   # strip pathname one-level deeper
    parent="${parent%/*}"   # remove filename
    # now we're mapping "projectOne 1.a" => "House/2018 01" e.g.
#   echo "$proj" "=>" "$parent"     # just for debugging
    hash["$proj"]="$parent"
done < <(find "$folderOld" -type f -print0) # directory B

# iterate over files in A; move to archive directory C and remove files in B
while IFS=  read -r -d $'\0' f; do
    proj="${f##*/}"
    proj="${proj%.*}"
    parent="${hash[$proj]}"
    if [[ "$parent" = "" ]]; then
    echo "parent of $f not found"   # may not occur but just in case ..
    else
    # move from A to C
    destdir="$folderArchive/$parent"
    mkdir -p -- "$destdir"
    mv -- "$f" "$destdir"

    # remove relevant file(s) in B
    for ext in "${workingTypes[@]}"; do
        oldfile="$folderOld/$parent/$proj.${ext}"
        [ -f "$oldfile" ] && rm -f -- "$oldfile"
    done
    fi
done < <(find "$folderNew" -type f -print0) # directory A

# clean-up: remove empty dirs in B
find "$folderOld" -type d -empty -print0 | xargs -r -0 rmdir --

说明:

  • 您不必使用逗号来分割数组中的元素。
  • 您不应在变量名的左侧添加$
  • while IFS= ... done < <(find ...)语法是一个循环find的输出的惯用法。
  • ${parameter#word}类型的语法是parameter expansion,用于从路径中提取子字符串。
  • 关联数组hash将每个项目名称(例如“ projectOne 1.a”)映射到其父目录名称(例如“ House / 2018 01”)。
  • 某些命令中的
  • --用于准备以-开头的文件名。 (这种保护可能看起来是病理性的...)

如果您的bash早于4.2,请告诉我。然后我们需要找到替代方法。

编辑
这是替代POSIX的版本:
(显然,如果文件名包含换行符或转义字符\x1b,该脚本将不起作用。)

#!/bin/sh

execHome="intendedMachine"
baseDir="/home/library/projects"
folderNew="output"
folderOld="working"
folderArchive="archive"
workingTypes="jpg
svg
bmp
tiff
psd"

folderNew="$baseDir/$folderNew"
folderOld="$baseDir/$folderOld"
folderArchive="$baseDir/$folderArchive"
nl="
"                   # set to newline character
esc=$(/bin/echo -ne "\033")      # set to escape character
#esc=":"            # if \033 does not work well, try another character

# substitute of reading a hash
# it relies on the context that IFS is set to $nl
read_lut() {
    local i
    local key
    local val
    local ret=""
    for i in $lut; do
        key="${i%${esc}*}"
        val="${i#*${esc}}"
    if [ "$key" = "$1" ]; then
        # loop until the end and use the last value
        ret="$val"
    fi
    done
    echo "$ret"
}

# substitute of writing to a hash
write_lut() {
    lut=$(printf "%s\n%s%c%s" "$lut" "$1" "$esc" "$2")
}

if [ "$(uname -n)" != "$execHome" ]; then
    printf "Executed from %s; Run from %s for proper execution.\n" "$(uname -n)" "$execHome"
    exit
fi

count=$(find "$folderNew" -type f |grep -v "DS_Store" |awk -F "/" '{print $NF}'|wc -l)
printf "\nFound/processing %s files in the %s folder\n\n" "$count" "$folderNew"

# determine parent directory name for each project name and create a map for them
ifs_bak="$IFS"
IFS="$nl"
for f in $(find "$folderOld" -type f); do
    proj="${f##*/}"         # remove dirname
    proj="${proj%.*}"               # remove extention
    parent="${f##*$baseDir/}"       # remove pathname until $baseDir
    parent="${parent#*/}"   # strip pathname one-level deeper
    parent="${parent%/*}"   # remove filename
    # now we're mapping "projectOne 1.a" => "House/2018 01" e.g.
#   echo "$proj" "=>" "$parent"     # just for debugging
    write_lut "$proj" "$parent"
done

# iterate over files in A; move to archive directory C and remove files in B
for f in $(find "$folderNew" -type f); do
    proj="${f##*/}"
    proj="${proj%.*}"
    parent=$(read_lut "$proj")
    if [ "$parent" = "" ]; then
        echo "parent of $f not found"   # may not occur but just in case ..
    else
        # move from A to C
        destdir="$folderArchive/$parent"
        mkdir -p -- "$destdir"
        mv -- "$f" "$destdir"

        # remove relevant file(s) in B
        for ext in $workingTypes; do
            oldfile="$folderOld/$parent/$proj.${ext}"
            [ -f "$oldfile" ] && rm -f -- "$oldfile"
        done
    fi
done

# clean-up: remove empty dirs in B
find "$folderOld" -type d -empty -print0 | xargs -r -0 rmdir --

# restore IFS
IFS="$ifs_bak"