我正在尝试从curl
拨打此python3
。这来自bash
,工作正常。
curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.1103/PhysRevLett.117.126802
产生预期结果:
@article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J. K. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H. W.}, year={2016}, month={Sep}}
在python3中,我正在做:
import subprocess
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
try:
subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
except ExplicitException:
print("DOI is not available")
self.Messages.on_warn_clicked("DOI is not given",
"Search google instead")
给出错误:
<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>
这里出了什么问题?
答案 0 :(得分:1)
这里有3个问题:
subprocess
中引用你的参数,它已经在必要时为你做了,因为你传递了参数而不是unsplitted命令行(好的做法,保持它,但删除不必要的引用) subprocess.call
不允许在python中解析/存储输出,这对于数字3是有问题的:
subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
应该是
subprocess.call(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi])
否则,引号会被应用两次而你的Accept: xxx
参数会引用它,这是curl
非工作报价部分的演示:
import subprocess,os
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
#### this is wrong because of the quoting ####
p = subprocess.Popen(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi],stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
[output,error] = p.communicate()
print(output)
结果:
b' some stats then ... <html><body><h1>400 Bad request</h1>\nYour browser sent an invalid request.\n</body></html>\n\r\n'
我已经实现了一个重试机制,它解析输出并重试,直到找到正确的输出:
import subprocess,os,sys
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
while True:
p = subprocess.Popen(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi],stdout=subprocess.PIPE)
[output,error] = p.communicate()
output = output.decode("latin-1")
if "java.util.concurrent.FutureTask.run" in output:
# site crashed when responding: junk HTML output: retry
sys.stderr.write("Wrong answer: retrying\n")
else:
print(output)
break
结果:
Wrong answer: retrying <==== here the site throwed a big HTML exception output
@article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J.âK. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H.âW.}, year={2016}, month={Sep}}
所以它有效,它只是一个站点问题,但是使用我的python包装器,你可以重新提交请求,直到它产生正确的答案。