Question

我希望从tiingo.com上各自的网页上搜索S＆amp; P 500各公司的财务数据

例如，请使用以下网址：

https://www.tiingo.com/f/b/aapl

显示Apple的最新资产负债表数据

我希望提取＆＃34; Property，Plant＆amp;设备＆＃34;最近一个季度的金额，在此特定情况下为25.45B。但是，我在编写正确的Beautiful Soup代码时无法提取此文本。

检查元素，我看到25.45B的数字在一个类＆＃34; ng-binding ng-scope＆＃34;在一个元素内和类中＃col; x-6 col-sm-3 col-md-3 col-lg-3语句 - 字段数据范围内，＆＃34;它本身就属于班级＆＃34; col-xs-7 col-sm-8 col-md-8 col-lg-9 no-padding-left no-padding-right。＆＃34;

但是，我不确定如何准确编写Beautiful Soup代码来定位正确的元素，然后执行element.getText（）函数。

我在想这样的事情：

import os, bs4, requests

res_bal = requests.get("https://www.tiingo.com/f/b/aapl")

res_bal.raise_for_status()

soup_bal = bs4.BeautifulSoup(res_bal.text, "html.parser")

elems_bal = soup_bal.select(".col-xs-6 col-sm-3 col-md-3 col-lg-3 statement-field-data ng-scope")

elems_bal_2 = elems_bal.select(".ng-binding ng-scope")

joe = elems_bal_2.getText()

print(joe)

但到目前为止，我还没有成功使用此代码。任何帮助将不胜感激！

Answer 1

选择器的问题

elems_bal = soup_bal.select(".col-xs-6 col-sm-3 col-md-3 col-lg-3 statement-field-data ng-scope")

elems_bal_2 = elems_bal.select(".ng-binding ng-scope")

就是说，页面中存在多个具有相同类的元素，因此您的结果不正确。

请注意，如果您只使用beautifulsoup并请求，那么页面源中的内容就没有您想要收集的数据，这可以完成在selenium和beautifulsoup的帮助下你可以做到：如果你没有安装硒，请先做：pip install selenium

这是相同的工作代码：

from selenium import webdriver
import  bs4, time

driver = webdriver.Firefox()   
driver.get("https://www.tiingo.com/f/b/aapl")
driver.maximize_window()
# sleep is given so that JS populate data in this time
time.sleep(10)
pSource= driver.page_source

soup = bs4.BeautifulSoup(pSource, "html.parser")

Property=soup.findAll('div',{'class':'col-xs-5 col-sm-4 col-md-4 col-lg-3 statement-field-name indent-2'})
for P in Property:
    if 'Property' in P.text.strip():
        print P.text

x=soup.find("a",{"ng-click":"toggleFundData('Property, Plant & Equipment',SDCol.restatedString==='restated',true)"})
print x.text

同样的输出是：

Property, Plant & Equipment
25.45B

用美丽的汤刮取tiingo HTML

1 个答案: