Question

所以我一直在努力完成这项任务，但仍然没有出错。该程序似乎没有下载任何pdf。与此同时，我检查了存储最终链接的文件 - 所有内容都正确存储。 $ PDFURL也检查，存储正确的值。任何bash粉丝准备好帮忙吗？

    #!/bin/sh

    #create a temporary directory where all the work will be conducted
    TMPDIR=`mktemp -d /tmp/chiheisen.XXXXXXXXXX`
     echo $TMPDIR

    #no arguments given - error
    if [ "$#" == "0" ]; then
          exit 1
    fi

    # argument given, but wrong format
    URL="$1"

    #URL regex 
    URL_REG='(https?|ftp|file)://[-A-Za-z0-9\+&@#/%?=~_|!:,.;]*[-A-Za-z0-9\+&@#/%=~_|]'

    if [[ ! $URL =~ $URL_REG ]]; then
          exit 1
    fi

    # go to directory created
    cd $TMPDIR

    #download the html page
    curl -s "$1" > htmlfile.html

    #grep only links into temp.txt
    cat htmlfile.html | grep -o -E 'href="([^"#]+)\.pdf"' | cut -d'"' -f2 > temp.txt

    # iterate through lines in the file and try to download
    # the pdf files that are there
    cat  temp.txt | while read PDFURL; do

    #if this is an absolute URL, download the file directly
    if [[ $PDFURL == *http* ]]
    then

        curl  -s -f -O $PDFURL
        err="$?"
        if [ "$err" -ne 0 ]
        then
              echo ERROR "$(basename $PDFURL)">&2
        else
              echo "$(basename $PDFURL)"
        fi

    else

         #update url - it is always relative to the first parameter in script
         PDFURLU="$1""/""$(basename $PDFURL)"
         curl -s -f -O $PDFURLU
         err="$?"
         if [ "$err" -ne 0 ]
         then
             echo ERROR "$(basename $PDFURLU)">&2
         else
             echo "$(basename $PDFURLU)"
         fi
       fi

      done


#delete the files
rm htmlfile.html
rm temp.txt

P.S。我刚刚发现的另一个小问题。也许问题出在if in regex？我非常希望看到那样的东西：

if [[ $PDFURL =~ (https?|ftp|file):// ]]

但这不起作用。我没有不必要的括号，为什么？

P.P.S。我还在以http开头的URL上运行此脚本，程序提供了所需的输出。但是，它仍未通过测试。

卷曲没有正确下载文件

0 个答案: