Question

使用本教程来刮取股票价格：https：//www.youtube.com/watch？v = f2h41uEi0xU

有一些类似的问题，但我想知道如何修复当前的代码（出于学习目的），这些代码只是解决了这些问题。

Web scraping information other than price from Yahoo Finance in Python 3

Using Regex to get multiple data on single line by scraping stocks from yahoo

我知道有更好的方法可以做到这一点，但这些视频有助于学习。

一切正常，但它并没有从网站上检索价格！我也有他确切的代码。我正在使用Python Launcher（Mac）2.7（也试过3.4）来运行python程序。

这是我的代码：

import urllib
import re

symbolslist = ["aapl", "spy", "goog", "nflx"]
i=0
while i<len(symbolslist):
    url = "http://finance.yahoo.com/q?s=" +symbolslist[i] +"&q1=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = '<span id ="yfs_l84_'+symbolslist[i] +'">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print "the price of" , symbolslist[i], " is " ,price
    i+=1

Answer 1

正则表达式中id后面有一个额外的空格。正确的正则表达式是：（参见下面的示例代码）。
价格是一个列表，所以要获得价格，你需要使用价格[0]。

示例代码：

>>> regex = '<span id="yfs_l84_"yfs_l84_'+symbolslist[i] +'"">(.+?)</span>'
>>> pattern = re.compile(regex)
>>> price = re.findall(pattern, htmltext)
>>> price
[u'568.77']
>>> price[0]
u'568.77'

Answer 2

It is never a good idea to parse HTML using regular expression.我建议使用像BeautifulSoup或lxml这样的解析器为您解析。另外，我要做的另一个改变是不使用while循环。使用for循环，就像我一样。我看到你已经定义了i并且正在递增它，所以for循环在这种情况下更有意义。

但至于你的正则表达式有什么问题，Tamim是对的，你的表达式的id=部分有一个额外的空格。

import urllib
from bs4 import BeautifulSoup

symbolslist = ["aapl", "spy", "goog", "nflx"]
for i in range(0, len(symbolslist)):
    url = "http://finance.yahoo.com/q?s=" +symbolslist[i] +"&q1=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    bs = BeautifulSoup(htmltext)
    idTag = 'yfs_l84_' + symbolslist[i]
    price = bs.find('span', {'id': idTag}).text
    print "the price of" , symbolslist[i], " is " ,price

股票价格没有用Python刮痧

2 个答案: