Question

我想知道是否有人可以帮助我为 https://finance.yahoo.com/quote/TSCO.l?p=TSCO.L

我目前正在使用此代码抓取当前价格

currentPriceData = soup.find_all('div', {'class':'My(6px) Pos(r) smartphone_Mt(6px)'})[0].find('span').text

这可以正常工作，但有时我会收到一个错误，但不确定为什么链接正确无误。但我想再次获得价格

类似

try: 
    currentPriceData = soup.find_all('div', {'class':'My(6px) Pos(r) smartphone_Mt(6px)'})[0].find('span').text
except Exception:
    currentPriceData = soup.find('span', {'class':'Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)'})[0].text

问题是我无法使用这种方法来刮掉号码，将不胜感激。

Answer 1

数据作为Javascript变量嵌入到页面中。但是您可以使用json模块进行解析。

例如：

import re
import json
import requests

url = 'https://finance.yahoo.com/quote/TSCO.l?p=TSCO.L'

html_data = requests.get(url).text

#the next line extracts from the HTML source javascript variable
#that holds all data that is rendered on page.
#BeautifulSoup cannot run Javascript, so we are going to use
#`json` module to extract the data.
#NOTE: When you view source in Firefox/Chrome, you can search for
#      `root.App.main` to see it.

data = json.loads(re.search(r'root\.App\.main = ({.*?});\n', html_data).group(1))

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

# We now have the Javascript variable extracted to standard python
# dict, so now we just print contents of some keys:

price = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['regularMarketPrice']['fmt']
currency_symbol = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['currencySymbol']

print('{} {}'.format(price, currency_symbol))

打印：

227.30 £

需要网络抓取帮助

1 个答案: