我想知道是否有人可以帮助我为 https://finance.yahoo.com/quote/TSCO.l?p=TSCO.L
我目前正在使用此代码抓取当前价格
currentPriceData = soup.find_all('div', {'class':'My(6px) Pos(r) smartphone_Mt(6px)'})[0].find('span').text
这可以正常工作,但有时我会收到一个错误,但不确定为什么链接正确无误。但我想再次获得价格
类似
try:
currentPriceData = soup.find_all('div', {'class':'My(6px) Pos(r) smartphone_Mt(6px)'})[0].find('span').text
except Exception:
currentPriceData = soup.find('span', {'class':'Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)'})[0].text
问题是我无法使用这种方法来刮掉号码,将不胜感激。
答案 0 :(得分:0)
数据作为Javascript变量嵌入到页面中。但是您可以使用json
模块进行解析。
例如:
import re
import json
import requests
url = 'https://finance.yahoo.com/quote/TSCO.l?p=TSCO.L'
html_data = requests.get(url).text
#the next line extracts from the HTML source javascript variable
#that holds all data that is rendered on page.
#BeautifulSoup cannot run Javascript, so we are going to use
#`json` module to extract the data.
#NOTE: When you view source in Firefox/Chrome, you can search for
# `root.App.main` to see it.
data = json.loads(re.search(r'root\.App\.main = ({.*?});\n', html_data).group(1))
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
# We now have the Javascript variable extracted to standard python
# dict, so now we just print contents of some keys:
price = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['regularMarketPrice']['fmt']
currency_symbol = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['currencySymbol']
print('{} {}'.format(price, currency_symbol))
打印:
227.30 £