Question

我是Bashing的新手，并编写了一个代码来检查我的照片文件，但发现它很慢，并获得一些空的返回来检查17000多张照片。有没有办法使用运行此脚本的所有4个cpu，从而加快速度

请帮忙

#!/bin/bash
readarray -t array < ~/Scripts/ourphotos.txt
totalfiles="${#array[@]}"
echo $totalfiles
i=0
ii=0
check1=""
while : 
do

check=${array[$i]}
if [[ ! -r $( echo $check ) ]] ; then
    if [ $check = $check1 ]; then
     echo "empty "$check
    else
    unset array[$i]
    ii=$((ii + 1 ))
    fi
fi
if [ $totalfiles = $i ]; then
break
fi
i=$(( i + 1 ))
done 

if [ $ii -gt "1" ]; then
 notify-send -u critical $ii" files have been deleted or are unreadable"
 fi

Answer 1

这是一个文件系统操作，因此多个内核几乎无法提供帮助。简化可能：

while read file; do 
   i=$((i+1)); [ -e "$file" ] || ii=$(ii+1)); 
done < "$HOME/Scripts/ourphotos.txt"
#...

两点：

您不需要将整个文件保留在内存中（不需要数组）
$( echo $check )分叉进程。您通常希望避免在循环中分叉和执行。

Answer 2

这是一个古老的问题，但缺少基于证据的解决方案。

awk '{print "[ -e "$1" ] && echo "$2}' | parallel    # 400 files/s
awk '{print "[ -e "$1" ] && echo "$2}' | bash        # 6000 files/s
while read file; do [ -e $file ] && echo $file; done # 12000 files/s
xargs find                                           # 200000 files/s
parallel --xargs find                                # 250000 files/s
xargs -P2 find                                       # 400000 files/s
xargs -P96 find                                      # 800000 files/s

我在一些不同的系统上进行了尝试，结果不一致，但是xargs -P（并行执行）始终是最快的。我感到惊讶的是xargs -P比GNU并行要快（上面没有报告，但有时要快得多），而我感到惊讶的是并行执行的作用如此之大-我认为文件I / O将是限制因素，而并行执行则不会没关系。

还值得注意的是，xargs find比公认的解决方案快约20倍，并且更加简洁。例如，这是对OP脚本的重写：

#!/bin/bash

total=$(wc -l ~/Scripts/ourphotos.txt | awk '{print $1}')

# tr '\n' '\0' | xargs -0 handles spaces and other funny characters in filenames
found=$(cat ~//Scripts/ourphotos.txt | tr '\n' '\0' | xargs -0 -P4 find | wc -l)

if [ $total -ne $found ]; then
  ii=$(expr $total - $found)
  notify-send -u critical $ii" files have been deleted or are unreadable"
fi

如何加快检查文件是否存在于bash中

2 个答案: