所以我一直在研究一个简单的脚本,它从项目主目录中的.txt文件中提取股票代码,而我似乎无法将其带回定价数据。如果我手动将它们输入到字符串数组中,但是当从文件中提取时,我只是不想返回价格。
import urllib
import re
symbolfile = open("symbols.txt")
symbolslist = symbolfile.read()
newsymbolslist = symbolslist.split("\n")
i = 0
while i<len(newsymbollist):
url = "http://finance.yahoo.com/q?uhb=uh3_finance_vert_gs_ctrl1&fr=&type=2button&s=" +symbolslist[i] +""
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex = '<span id="yfs_184_' +newsymbolslist[i] +'">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern,htmltext)
print "The price of", newsymbolslist[i] ," is ", price
i+=1
我真的可以使用一些帮助,因为它没有在shell中给出任何错误原因。
提前感谢您的帮助!
答案 0 :(得分:0)
通过实施@Linus Gustav Larsson Thiel在评论中提供的修改以及另一个关于regex
您的代码返回正确结果的修改。请注意正则表达式中的lowercase()
,因为源包含小写符号:
i = 0
while i < len(newsymbolslist):
url = "http://finance.yahoo.com/q?uhb=uh3_finance_vert_gs_ctrl1&fr=&type=2button&s=" +newsymbolslist[i]
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex = '<span id="yfs_l84_' +newsymbolslist[i].lower() +'">(.+?)</span>'
pattern = re.compile(regex)
price = pattern.findall(htmltext)
print "The price of", newsymbolslist[i] ," is ", price
i+=1
使用静态列表进行测试['AAPL','GOOGL','MSFT']
我收到以下输出:
The price of AAPL is ['98.53']
The price of GOOGL is ['733.07']
The price of MSFT is ['52.30']
如果您愿意,也可以简化代码:
baseurl = "http://finance.yahoo.com/q?uhb=uh3_finance_vert_gs_ctrl1&fr=&type=2button&s="
for symbol in newsymbolslist:
url = baseurl + symbol
source = urllib.urlopen(url).read()
regex = re.compile('<span id="yfs_l84_' + symbol.lower() + '">(.+?)</span>')
price = regex.findall(source)[0]
print "The price of", symbol, "is", price
for ... in ...
循环消除了对计数器变量的需要,并且由于findall()
返回匹配列表(而您只期望一个),您可以附加[0]
以显示包含的字符串和不是具有单个元素的列表。
这将返回以下内容:
The price of AAPL is 98.53
The price of GOOGL is 733.07
The price of MSFT is 52.30