Question

我正在尝试从提到的URL中提取数据（59,805）。而且我正在使用BeautifulSoup和Python的请求包。

下面是我正在尝试的代码，但是没有任何结果。下面是HTML代码，我尝试从中提取。结果应为“已确认” 59,805

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

case_type = []
count = []

url = requests.get('https://www.covid19india.org/')
soup = bs(url.content,'html.parser')

for a in soup.findAll('div',  attrs={'class':'level-item is-cherry fadeInUp'}):
    b = a.find('h1')
    c = a.find('h5')
    case_type.append(c.text)
    count.append(b.text)

df = pd.DataFrame({'Case Type':case_type, 'Count':count})
print(df)

上述网页的HTML代码段

 <div class="Level">
      <div class="level-item is-cherry fadeInUp" style="animation-delay: 1s;">
        <h5>Confirmed</h5>
        <h4>[+115]</h4>
        <h1>59,805 </h1>
      </div>
      <div class="level-item is-blue fadeInUp" style="animation-delay: 1.1s;">
        <h5 class="heading">Active</h5>
        <h4>&nbsp;</h4>
        <h1 class="title has-text-info">39,914</h1>
      </div>
      <div class="level-item is-green fadeInUp" style="animation-delay: 1.2s;">
        <h5 class="heading">Recovered</h5>
        <h4>[+14]</h4>
        <h1 class="title has-text-success">17,901 </h1>
      </div>

Answer 1

此网站是在React中创建的，因此您通过请求获取的内容不会包含所有网站内容，因为该内容是动态加载的

如果您查看网站在加载时发出的网络请求，则会看到该信息来自：

https://api.covid19india.org/data.json

因此，您可以（假设您不影响网站性能/没有获得许可）这样做：

r = requests.get('https://api.covid19india.org/data.json')
j = r.json()
confirmed = j['statewise'][0]['confirmed']
print(confirmed)

爬网数据未返回结果

1 个答案: