Question

这是我使用Christophers Reeves教程编写的一个代码，这是他在youtube上关于这个主题的第三个视频。

import urllib
import re

symbolslist = ["aapl","spy","goog","nflx"]

i=0
while i<len(symbolslist):
    url = "http://finance.yahoo.com/q?s=" +symbolslist[i] +"&q1=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = '<span id="yfs_l84_'+symbolslist[i] +'">(.?+)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print "The price of", symbolslist[i]," is", price
    i+=1

我在python 2.7.5中运行此代码时出现以下错误

Traceback <most recent call last>:
File "fundamentalism)stocks.py, line 12, in <module>
pattern = re.compile(regex)
File "C:\Python27\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\lib\re.py, line 242, in compile
raise error, v # invalid expression
sre_constant.error: multiple repeat

我不知道问题是我的库，安装方式，我的python版本或者是什么。我感谢您的帮助。

Answer 1

问题在于使用多个重复字符：+和?。

可能代之以non-greedy匹配：(.+?)：

“*”，“+”和“?”限定符都是贪婪的;它们匹配尽可能多的文本。有时这种行为是不可取的;如果RE <.*>与“<H1>title</H1>”匹配，则会匹配整个字符串，而不仅仅是“<H1>”。在限定符后添加“?”使其以非贪婪或最小的方式执行匹配;尽可能少的字符将匹配。在上一个表达式中使用.*?只会匹配“<H1>”..

Answer 2

其他人已经回答了关于贪婪的比赛，但是在一个不相关的说明中，你会想写更多的信息：

for symbol in symbolslist:
    url = "http://finance.yahoo.com/q?s=%s&q1=1" % symbol
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = '<span id="yfs_l84_%s">(.?+)</span>' % symbol
    price = re.findall(regex, htmltext)[0]
    print "The price of", symbol," is", price

标准的Python习惯用法是遍历列表中的所有值，而不是通过索引选择它们。
“字符串插值”比字符串连接更容易管理，特别是如果您要在混合中添加多个值（例如，您可能希望在更高版本中指定q1的值）。 / LI>
re.findall将字符串作为其第一个参数。明确地编译一个模式，然后在下一个循环中抛弃它不会得到任何东西。
re.findall返回一个列表，您只需要第一个元素。

Python Regex Compile 2.7.5

2 个答案: