Question

我不明白为什么它不起作用。

现在我正在以下位置处理财务表：

https://finance.yahoo.com/quote/ATVI/financials?p=ATVI

我没有得到的是find_all方法的结果。当我在其上加上更多的点符号时，例如find_all('td').children 引发错误。也许我的错误是因为find_all的返回值是一个对象，而不是列表吗？

对于下面的代码为什么不起作用，我一无所知。

span_tag1=soup.find_all('td')
for i in span_tag1.children:
    print(i.get_text)

Answer 1

由于找到了all td个元素（将创建一个列表），因此需要遍历每个元素，然后找到每个td元素的子元素：

for td in soup.find_all('td'):
    for child in td.children:
        print(child.get_text())

Answer 2

我会和熊猫一起去获得格式良好的表格，然后切出想要的东西

import pandas as pd

tables = pd.read_html('https://finance.yahoo.com/quote/ATVI/financials?p=ATVI')
print(tables[0].fillna(''))

Answer 3

find_all()返回一个列表，因此您需要对其进行循环。然后，您可以在元素上使用children，并在元素上调用get_text()。

for td in soup.find_all('td'):
    for child in td.children:
        print(child.get_text())

请注意，get_text()也是一种方法，请在其后加上括号。

Answer 4

遍历span_tag1列表以获取其中的每个元素：

import requests
from bs4 import BeautifulSoup

page = requests.get("https://finance.yahoo.com/quote/ATVI/financials?p=ATVI")
soup = BeautifulSoup(page.content, 'html.parser')

td = soup.find_all('td')

for et in td:
   for elem in et:
      print(elem.text)

输出：

Revenue
12/31/2018
12/31/2017
12/31/2016
12/31/2015
Total Revenue
7,500,000
7,017,000
6,608,000
4,664,000
Cost of Revenue
2,517,000
2,501,000
.
.

有关数据类型的BeautifulSoup代码问题

4 个答案: