即使存在(div =“ pendingcasescnts ng-scope”)元素,BeautifulSoup也不会返回None

时间:2018-10-30 16:31:37

标签: python html python-3.x web-scraping beautifulsoup

我正在尝试从站点Concluded Cases with Details的“ Div” multiCLass中抓取文本

The example of the "div" class

找不到div元素吗?

from bs4 import BeautifulSoup
from requests import get
url ="https://icsid.worldbank.org/en/Pages/cases/ConcludedCases.aspx?status=c"
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
cases_containers = html_soup.find_all('div', class_ ="pendingcasescnts ng-scope")
print(len(cases_containers))

1 个答案:

答案 0 :(得分:1)

您可以观察到页面不是通过抓取HTML而是通过单独的请求在屏幕上请求所有信息,该请求以JSON格式返回您需要的所有数据。可以使用.json()请求函数将其转换为Python字典。

以下内容显示了如何使用返回的JSON来提取Case NoSubjectSector字段:

from urllib3.exceptions import InsecureRequestWarning
import requests

requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)
r = requests.get('https://wbwcfe.worldbank.org/icsidext/service.svc/getbulkcasesbystatusid/json?id=cd28', verify=False)
data = r.json()

for case in data['GetBulkCasesByStatusIdResult']:
    print(f"Case No.: {case['caseno']}\nSubject: {case['subject']}\nSector: {case['econsector']}\n")    

为您提供的情况如下:

Case No.: CONC/18/1
Subject: Water services and electric power concession
Sector: Electric Power & Other Energy

Case No.: ARB/17/40
Subject: Hydrocarbon concession
Sector: Oil, Gas & Mining

Case No.: ARB/17/39
Subject: Hydrocarbon concession
Sector: Oil, Gas & Mining

URL是在加载问题中给出的URL时使用浏览器的网络工具找到的。

我建议您打印出data,并研究所有可用的字段。