Question

我正在尝试从curl拨打此python3。这来自bash，工作正常。

curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.1103/PhysRevLett.117.126802

产生预期结果：

 @article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J. K. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H. W.}, year={2016}, month={Sep}}

在python3中，我正在做：

import subprocess
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
try:
    subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
except ExplicitException:
    print("DOI is not available")
    self.Messages.on_warn_clicked("DOI is not given",
                                  "Search google instead")

给出错误：

<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>

这里出了什么问题？

Answer 1

这里有3个问题：

不要在subprocess中引用你的参数，它已经在必要时为你做了，因为你传递了参数而不是unsplitted命令行（好的做法，保持它，但删除不必要的引用）
然后，subprocess.call不允许在python中解析/存储输出，这对于数字3是有问题的：
和last：您的网站随机回答了垃圾HTML（java stacktrace）。这解释了为什么你在python中得到不同的输出，但你也可以用bash来获得它。

问题＃1

subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])

应该是

subprocess.call(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi])

否则，引号会被应用两次而你的Accept: xxx参数会引用它，这是curl

非工作报价部分的演示：

import subprocess,os
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"

#### this is wrong because of the quoting ####
p = subprocess.Popen(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi],stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
[output,error] = p.communicate()
print(output)

结果：

b' some stats then ... <html><body><h1>400 Bad request</h1>\nYour browser sent an invalid request.\n</body></html>\n\r\n'

问题＃2和＃3

我已经实现了一个重试机制，它解析输出并重试，直到找到正确的输出：

import subprocess,os,sys
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"

while True:
    p = subprocess.Popen(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi],stdout=subprocess.PIPE)
    [output,error] = p.communicate()
    output = output.decode("latin-1")
    if "java.util.concurrent.FutureTask.run" in output:
        # site crashed when responding: junk HTML output: retry
        sys.stderr.write("Wrong answer: retrying\n")
    else:
        print(output)
        break

结果：

Wrong answer: retrying   <==== here the site throwed a big HTML exception output
 @article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J.âK. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H.âW.}, year={2016}, month={Sep}}

所以它有效，它只是一个站点问题，但是使用我的python包装器，你可以重新提交请求，直到它产生正确的答案。

不能从python3调用curl

1 个答案:

问题＃1

问题＃2和＃3