我正在尝试使用BeautifulSoup
抓取以下网址:
https://www.investopedia.com/markets/stocks/aapl/#Financials
我试图解析我通过检查发现的这一部分:
<div class="value">
<div class="marker position" style="left: 89.25%;"></div>
<div class="text position" style="left: 89.25%;">1.43</div>
</div>
MyCode如下:
import bs4 as bs
import requests
def load_ticker_invest(ticker):
resp = requests.get('https://www.investopedia.com/markets/stocks/{}/#Financials'.format(ticker))
soup = bs.BeautifulSoup(resp.text, 'html.parser')
trend = soup.div.find_all('div', attrs={'class':'value'})
return trend
print (load_ticker_invest('aapl'))
我得到的结果是一个空白列表:
[]
我该如何解决?
答案 0 :(得分:1)
此站点使用内部API获取这些数据,此API调用需要一些令牌,这些令牌已嵌入页面https://www.investopedia.com/markets/stocks/aapl内的某些Javascript脚本中,因此您需要首先使用一些正则表达式抓取这些值,然后再使用它们在API调用中
title=aapl
IFS=' ' read token token_userid < <(curl -s "https://www.investopedia.com/markets/stocks/$title/" | \
tr -d '\n' | \
sed -rn "s:.*Xignite\(\s*'([A-Z0-9]+)',\s*'([A-Z0-9]+)'.*:\1 \2:p")
curl -s "https://factsetestimates.xignite.com/xFactSetEstimates.json/GetLatestRecommendationSummaries?IdentifierType=Symbol&Identifiers=$title&UpdatedSince=&_token=$token&_token_userid=$token_userid" | \
jq -r '.[].RecommendationSummarySet | .[].RecommendationScore'
使用python:
import requests
import re
ticker = 'aapl'
r = requests.get('https://www.investopedia.com/markets/stocks/{}/'.format(ticker))
result = re.search(r".*Xignite\(\s*'([A-Z0-9]+)',\s*'([A-Z0-9]+)'", r.text)
token = result.group(1)
token_userid = result.group(2)
r = requests.get('https://factsetestimates.xignite.com/xFactSetEstimates.json/GetLatestRecommendationSummaries?IdentifierType=Symbol&Identifiers={}&UpdatedSince=&_token={}&_token_userid={}'
.format(ticker, token, token_userid)
)
print(r.json()[0]['RecommendationSummarySet'][0]['RecommendationScore'])
答案 1 :(得分:1)
import requests
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import bs4 as bs
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "normal"
driver = webdriver.Chrome(desired_capabilities=caps)
driver.get('https://www.investopedia.com/markets/stocks/aapl/#Financials')
resp = driver.execute_script('return document.documentElement.outerHTML')
driver.quit()
soup = bs.BeautifulSoup(resp, 'html.parser')
res = soup.find('div', attrs={'class':'text position'}).text
print (res)