不能从python3调用curl

时间:2016-09-15 20:09:18

标签: python-3.x curl

我正在尝试从curl拨打此python3。这来自bash,工作正常。

curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.1103/PhysRevLett.117.126802

产生预期结果:

 @article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J. K. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H. W.}, year={2016}, month={Sep}}

在python3中,我正在做:

import subprocess
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
try:
    subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
except ExplicitException:
    print("DOI is not available")
    self.Messages.on_warn_clicked("DOI is not given",
                                  "Search google instead")

给出错误:

<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>

这里出了什么问题?

1 个答案:

答案 0 :(得分:1)

这里有3个问题:

  1. 不要在subprocess中引用你的参数,它已经在必要时为你做了,因为你传递了参数而不是unsplitted命令行(好的做法,保持它,但删除不必要的引用)
  2. 然后,subprocess.call不允许在python中解析/存储输出,这对于数字3是有问题的:
  3. 和last:您的网站随机回答了垃圾HTML(java stacktrace)。这解释了为什么你在python中得到不同的输出,但你也可以用bash来获得它。
  4. 问题#1

    subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
    

    应该是

    subprocess.call(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi])
    

    否则,引号会被应用两次而你的Accept: xxx参数会引用它,这是curl

    的意外情况

    非工作报价部分的演示:

    import subprocess,os
    doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
    
    #### this is wrong because of the quoting ####
    p = subprocess.Popen(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi],stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    [output,error] = p.communicate()
    print(output)
    

    结果:

    b' some stats then ... <html><body><h1>400 Bad request</h1>\nYour browser sent an invalid request.\n</body></html>\n\r\n'
    

    问题#2和#3

    我已经实现了一个重试机制,它解析输出并重试,直到找到正确的输出:

    import subprocess,os,sys
    doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
    
    while True:
        p = subprocess.Popen(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi],stdout=subprocess.PIPE)
        [output,error] = p.communicate()
        output = output.decode("latin-1")
        if "java.util.concurrent.FutureTask.run" in output:
            # site crashed when responding: junk HTML output: retry
            sys.stderr.write("Wrong answer: retrying\n")
        else:
            print(output)
            break
    

    结果:

    Wrong answer: retrying   <==== here the site throwed a big HTML exception output
     @article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J.âK. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H.âW.}, year={2016}, month={Sep}}
    

    所以它有效,它只是一个站点问题,但是使用我的python包装器,你可以重新提交请求,直到它产生正确的答案。