我们发现使用HTTPS FORM访问JPL spectroscopic
catalog时出现问题
python requests
库。 HTTPS表单为所有查询返回相同的响应:
Zero lines were found for your search criteria.
r = requests.post('https://spec.jpl.nasa.gov/cgi-bin/catform',
data={'Mol' :'18003+H2O', 'MinNu':"500", 'MaxNu':"600",
'MaxLines': '2000', 'UnitNu':'GHz', 'StrLim': "-500"})
print(r.text)
Out: 'Zero lines were found for your search criteria.\n'
print(r.status_code)
Out: 200
但查询表格 https://spec.jpl.nasa.gov/ftp/pub/catalog/catform.html 使用浏览器(Firefox 60.0.1) 使用下面显示的请求(重定向到请求网址https://spec.jpl.nasa.gov/cgi-bin/catform):
POST /cgi-bin/catform HTTP/1.1
Host: spec.jpl.nasa.gov
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Length: 70
Content-Type: application/x-www-form-urlencoded
Upgrade-Insecure-Requests: 1
Connection: keep-alive
MinNu=500
MaxNu=600
MaxLines=2000
UnitNu=GHz
StrLim=-500
Mol=18003+H2O
给出以下回应:
18003 H2O
503568.5200 0.0200 -4.9916 3 1394.8142 51 -180031404 8 6 3 0 7 7 0 0
504482.6900 0.0500 -5.4671 3 1394.8142 17 -180031404 8 6 2 0 7 7 1 0
525890.1638 0.8432-12.2048 3 5035.1266117 18003140419 514 0 18 811 0
530342.8600 0.2000 -7.1006 3 2533.7932 87 -18003140414 312 0 13 4 9 0
534240.4544 0.3469-11.2954 3 4409.3446 37 18003140418 414 0 17 711 0
556935.9877 0.0003 -0.8189 3 23.7944 9 -180031404 1 1 0 0 1 0 1 0
557985.4794 0.6432-11.6213 3 4833.2084117 18003140419 415 0 18 712 0
558017.0036 12.4193-18.1025 3 7729.4622 49 18003140424 618 0 25 521 0
571913.6860 0.1000 -6.9705 3 2414.7235 75 -18003140412 6 7 0 13 310 0
591693.4339 0.2120 -8.6820 3 3244.6008 87 18003140414 7 8 0 15 411 0
593113.7249 7.4502-18.5975 3 7924.4438 49 18003140424 717 0 231014 0
593227.8163 0.4197-10.8822 3 4201.2514 35 18003140417 612 0 18 315 0
596308.5878 4.5348-15.8345 3 6687.8251 47 18003140423 519 0 22 616 0
在requests
POST调用中发送相同的标题也无济于事。
headers = {
'Host': 'spec.jpl.nasa.gov',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/x-www-form-urlencoded',
'Upgrade-Insecure-Requests': '1',
'Connection': 'keep-alive',
}
data = {
'MaxLines': '2000',
'MaxNu': '600',
'MinNu': '500',
'Mol': '18003+H2O',
'StrLim': '-500',
'UnitNu': 'GHz',
}
r = requests.post('https://spec.jpl.nasa.gov/cgi-bin/catform', headers=headers,
data=data)
print(r.text)
Out: 'Zero lines were found for your search criteria.\n'
从表单的HTTP版本到HTTPS版本有一个重定向 但似乎这不会导致问题。知道为什么代码不是 提供预期的结果?
答案 0 :(得分:1)
查看网站后,表单帖子使用HTTP而不是HTTPS。如果您使用HTTPS发布,他们的网站似乎无法正确回复。
这应该有效:
import requests
data = {
'MaxLines': '2000',
'MaxNu': '600',
'MinNu': '500',
'Mol': '18003+H2O',
'StrLim': '-500',
'UnitNu': 'GHz',
}
r = requests.post('http://spec.jpl.nasa.gov/cgi-bin/catform', data=data)
print(r.status_code)
print(r.text.split('\n')[:10])
Out: ['<OPTION>1001 H-atom ',
'<OPTION>2001 D-atom ',
'<OPTION>3001 HD ',
'<OPTION>4001 H2D+ ',
'<OPTION>7001 Li-6-H ',
'<OPTION>8001 LiH ',
'<OPTION>8002 Li-6-D ',
'<OPTION>9001 LiD ',
'<OPTION>12001 C-atom ',
'<OPTION>13001 C-13-atom ']
答案 1 :(得分:1)
当前,必须以与JPL目录的Web形式相同的顺序来提供数据参数,因为在代码接受UnitNu的输入后需要转换频率单位。因此,必须使用与Web表单中相同的顺序将数据提供为元组或OrderedDict
的列表。这是发布有序参数的脚本。
import requests
data = [('MinNu', '500'),
('MaxNu', '1000'),
('MaxLines', '2000'),
('UnitNu', 'GHz'),
('StrLim', '-500'),
('Mol', '18003+H2O'),
]
r = requests.post('https://spec.jpl.nasa.gov/cgi-bin/catform',
data=data)
print(r.text)