Question

我正在尝试抓取该网站，以下图像是我所得到的。 url ='https://www.worldometers.info/world-population/population-by-country/'

我已经在stackoverflow上尝试了所有类似的解决方案，但是它对我不起作用

table_data=soup.find('table', {"id" : "example2"}, class_='table table-striped table-bordered dataTable no-footer')

headers = []
for i in table_data.find_all('th'):
    title = i.text
    headers.append(title)

Error message
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-129-e8b5de995a9d> in <module>
      1 table_data=soup.find('table', {"id" : "example2"}, class_='table table-striped table-bordered dataTable no-footer')
      2 headers = []
----> 3 for i in total_data.find_all('th'):
      4     title = i.text
      5     headers.append(title)

AttributeError: 'NoneType' object has no attribute 'find_all'

这是我尝试用于擦除表的代码，但它也无法正常工作。进一步的帮助

for j in table_data.find_all('tr')[1:]:
        row_data = j.find_all('td')
        row = [tr.text for tr in row_data]
        length = len(df)
        df.loc[length] = row


ValueError: cannot set a frame with no defined columns

Answer 1

“ findAll”是一个漂亮的汤函数，这意味着您必须使用：

soup.findAll('th')

Answer 2

我已经看过页面并使用过：

table_data = soup.find('table', id="example2")
columns = [x.text for x in table_data.find("thead").find_all("th")][1:]
rows = [[x.text for x in y.find_all("td")][1:] for y in table_data.find("tbody").find_all("tr")]
dt = pd.DataFrame(rows, columns=columns)

测试它；-）

Pytho，BeautifulSoup-Web抓取“ find_all”返回NoneType

2 个答案: