我正在尝试解析一些HTML,但是我想要的部分根本没有出现在汤中。前一部分和后部分都在那儿,但我想要的部分不在那儿。我在做错什么吗?
URL:https://coronavirus-portugal-esriportugal.hub.arcgis.com/ 我的代码(带有URL):
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
url = 'https://coronavirus-portugal-esriportugal.hub.arcgis.com/'
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
soup = soup(page_html, 'html.parser')
body = soup.body
print(body.prettify())
我正在寻找前四个数字(与“ Casos Confirmados”,“ Casos Suspeitos”,“ Recuperados”,“Óbitos”相对应的数字)
答案 0 :(得分:0)
从后端SQL数据库动态检索数据。如果检查更新页面的网络流量(并了解一些SQL),则可以了解如何编写查询以发送给自己以检索特定于葡萄牙的数据。 215与葡萄牙相对应。
import requests
r = requests.get('https://services1.arcgis.com/0MSEUqKaxRlEPj5g/arcgis/rest/services/ncov_cases/FeatureServer/1/query?f=json&where=OBJECTID=215&outFields=*')
print(r.json())
所有数据(使用其他查询):
https://services1.arcgis.com/0MSEUqKaxRlEPj5g/arcgis/rest/services/ncov_cases/FeatureServer/1/query?f=json&where=1=1&outFields=*
您还可以动态获取查询字符串中使用的其他标识符
import requests, re
country_id = 215
with requests.Session() as s:
r = s.get('https://coronavirus-portugal-esriportugal.hub.arcgis.com/')
p = re.compile(r'https://services1.arcgis.com/(.*?)/arcgis')
code = p.findall(r.text)[0]
r = s.get(f'https://services1.arcgis.com/{code}/arcgis/rest/services/ncov_cases/FeatureServer/1/query?f=json&where=OBJECTID={country_id}&outFields=*')
print(r.json())