卷曲没有正确下载文件

时间:2013-11-04 06:06:44

标签: bash curl

所以我一直在努力完成这项任务,但仍然没有出错。该程序似乎没有下载任何pdf。与此同时,我检查了存储最终链接的文件 - 所有内容都正确存储。 $ PDFURL也检查,存储正确的值。任何bash粉丝准备好帮忙吗?

    #!/bin/sh

    #create a temporary directory where all the work will be conducted
    TMPDIR=`mktemp -d /tmp/chiheisen.XXXXXXXXXX`
     echo $TMPDIR

    #no arguments given - error
    if [ "$#" == "0" ]; then
          exit 1
    fi

    # argument given, but wrong format
    URL="$1"

    #URL regex 
    URL_REG='(https?|ftp|file)://[-A-Za-z0-9\+&@#/%?=~_|!:,.;]*[-A-Za-z0-9\+&@#/%=~_|]'

    if [[ ! $URL =~ $URL_REG ]]; then
          exit 1
    fi

    # go to directory created
    cd $TMPDIR

    #download the html page
    curl -s "$1" > htmlfile.html

    #grep only links into temp.txt
    cat htmlfile.html | grep -o -E 'href="([^"#]+)\.pdf"' | cut -d'"' -f2 > temp.txt

    # iterate through lines in the file and try to download
    # the pdf files that are there
    cat  temp.txt | while read PDFURL; do

    #if this is an absolute URL, download the file directly
    if [[ $PDFURL == *http* ]]
    then

        curl  -s -f -O $PDFURL
        err="$?"
        if [ "$err" -ne 0 ]
        then
              echo ERROR "$(basename $PDFURL)">&2
        else
              echo "$(basename $PDFURL)"
        fi

    else

         #update url - it is always relative to the first parameter in script
         PDFURLU="$1""/""$(basename $PDFURL)"
         curl -s -f -O $PDFURLU
         err="$?"
         if [ "$err" -ne 0 ]
         then
             echo ERROR "$(basename $PDFURLU)">&2
         else
             echo "$(basename $PDFURLU)"
         fi
       fi

      done


#delete the files
rm htmlfile.html
rm temp.txt

P.S。我刚刚发现的另一个小问题。也许问题出在if in regex?我非常希望看到那样的东西:

if [[ $PDFURL =~ (https?|ftp|file):// ]]

但这不起作用。我没有不必要的括号,为什么?

P.P.S。我还在以http开头的URL上运行此脚本,程序提供了所需的输出。但是,它仍未通过测试。

0 个答案:

没有答案