Question

我对bash脚本非常陌生，我想从某个目录及其子文件夹中删除pdf文件中的所有元数据。所以我拿了this script并尝试把它放在一个循环中。

    for file in $(find . -iname '*.pdf')
    do
       pdftk $file dump_data | \
       sed -e 's/\(InfoValue:\)\s.*/\1\ /g' | \
       pdftk $1 update_info - output $file.tmp

       exiftool -all:all= $file.tmp
       exiftool -all:all $file.tmp
       exiftool -extractEmbedded -all:all $file.tmp
       qpdf --linearize $file.tmp $file

       pdftk $file dump_data
       exiftool $file
       pdfinfo -meta $file
done

我收到错误但我不知道原因。

Error: No input files.  Exiting.
Errors encountered.  No output created.

无论如何，用这种方法删除不必要的信息是不是一个好主意，还是有更好的方法？

迎接

Answer 1

这个版本按预期工作，虽然它不漂亮

find -name "* *" -type d | rename 's/ /_/g' 
find -name "* *" -type f | rename 's/ /_/g'
# Removes whitespace from directories and files

for file in $(find . -iname '*.pdf')
do

    pdftk $file dump_data | \
    sed -e 's/\(InfoValue:\)\s.*/\1\ /g' | \
    pdftk $file update_info - output $file-clean

    exiftool -all:all= $file-clean
    exiftool -all:all $file-clean
    exiftool -extractEmbedded -all:all $file-clean
    qpdf --linearize $file-clean $file-clean2

    pdftk $file-clean2 dump_data
    exiftool $file-clean2
    pdfinfo -meta $file-clean2
    rm -f $file $file-clean $file-clean_original $file_original
    mv $file-clean2 $file

done

echo finished

bash-script删除pdf元数据

1 个答案: