我正试图从https://finance.yahoo.com/quote/GOOG?ltr=1和元素
中提取价格<title>GOOG 989.68 1.85 0.19% : Alphabet Inc. - Yahoo Finance</title>
但我的输出不包含989.68的价格。相反,我得到了这个:
['GOOG : Summary for Alphabet Inc. - Yahoo Finance']
这是我的代码:
import urllib.request
import re
htmlfile = urllib.request.urlopen("http://finance.yahoo.com/q?s=GOOG");
htmltext = htmlfile.read();
pattern = re.compile('<title>(.*?)</title>');
price = pattern.findall(str(htmltext));
print(price);
答案 0 :(得分:2)
我在run -> edit configurations
中没有看到任何股票信息,但我能够使用BeautifulSoup让它工作:
<title></title>
,其输出为
import requests
from bs4 import BeautifulSoup
page = requests.get('https://finance.yahoo.com/quote/GOOG?ltr=1')
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.select_one('div#quote-header-info')
print(container.find('h1').text)
for ele in container.find_all('span'):
print(ele.text)
我强烈建议不使用GOOG - Alphabet Inc.
NasdaqGS - NasdaqGS Delayed Price. Currency in USD
989.68
+1.85 (+0.19%)
At close: 4:00PM EDT
来查找您的元素,因为在新版本发布到网站后,这种情况很可能会发生变化。它是React框架使用的内部ID。此外,在某些浏览器中,React甚至没有将react-id作为属性,而是将data-reactid
答案 1 :(得分:1)
标题中实际上并未包含价格。转到页面源并亲自查看。如果你只使用BeautifulSoup而不是re:
,它会简单得多import requests
from bs4 import BeautifulSoup
url = 'https://finance.yahoo.com/quote/GOOG'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
# Use this to look at the source code
# print soup.prettify()
# Here is the exact tag of the span containing the price,
# not sure if it'll be the same every time
for span in soup.find_all('span', attrs={'class': 'Trsdu(0.3s) Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(b)'}):
price = span.text
break
print price
989.68
# Here is a more generic tag for the span, the value for this can change as well,
# but its a simpler change. The price is contained in the first span like this,
# so a break will make sure you get the correct one
for span in soup.find_all('span', attrs={'data-reactid': '14'}):
price = span.text
break
print price
989.68
答案 2 :(得分:0)
我已经浏览了您提到的网页网址的html来源。如你所说,在javascript的帮助下,价格被加载到标题中。如果检查html源代码,则可以在title标记之前看到该脚本。因为无论何时使用脚本向网站发出请求,它都会返回html代码作为响应。 Python脚本不理解javascripts,因此标题中没有加载价格。我建议你使用请求库来提出请求,因为它有先进的功能。requests docs。和其他人一样,我会使用BeautifulSoup
来解析html
。这很容易理解。BeautifulSoup docs。使用lxml
解析器。因此,如果你在脚本中遵循这些,你的代码应该是
import requests
from bs4 import BeautifulSoup
url="https://finance.yahoo.com/quote/GOOG?ltr=1"
response=requests.get(url)
soup=BeautifulSoup(response.contemt,"lxml")
price=soup.find("span",{"data-reactid":"35"}).text
print price
这应该按预期返回价格。
答案 3 :(得分:0)
使用正则表达式可以获得所需的项目。这是代码。
import urllib
import re
htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=GOOG")
htmltext = htmlfile.read()
# for the title
pattern = re.compile('<title>(.*?)</title>')
title = pattern.findall(str(htmltext))
print('title:',title[0])
# regularMarketPrice
pattern = re.compile('\"regularMarketPrice\":{\"raw\":(.*?),')
regularMarketPrice = pattern.findall(str(htmltext))
print('regularMarketPrice:', regularMarketPrice[0])
# regularMarketChange
pattern = re.compile('\"regularMarketChange\":{\"raw\":(.*?),')
regularMarketChange = pattern.findall(str(htmltext))
print('regularMarketChange:',regularMarketChange[0])
# regularMarketChangePercent
pattern = re.compile('\"regularMarketChangePercent\":{\"raw\":(.*?),')
regularMarketChangePercent = pattern.findall(str(htmltext))
print('regularMarketChangePercent:',regularMarketChangePercent[0]) # x100 to get percent
# for close time
pattern = re.compile('<span data-reactid="21">At close:(.*?)</span>')
at_close = pattern.findall(str(htmltext))
print('At close:',at_close[0])
输出:
('title:', 'GOOG : Summary for Alphabet Inc. - Yahoo Finance')
('regularMarketPrice:', '989.68')
('regularMarketChange:', '1.8499756')
('regularMarketChangePercent:', '0.0018727671')
('At close:', ' 4:00PM EDT')
答案 4 :(得分:0)
你可以这样做,以获得所需的输出而不使用正则表达式:
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get('https://finance.yahoo.com/quote/GOOG?ltr=1').text, 'lxml')
for item in soup.select("div#quote-header-info"):
title = item.select("h1")[0].text
price = [elem.text for elem in item.select("span")[1:3]]
print("Name: {}\nClosing Status: {}".format(title,' '.join(price)))
结果:
Name: GOOG - Alphabet Inc.
Closing Status: 989.68 +1.85 (+0.19%)