我遇到以下代码时遇到问题,它假设通过访问雅虎财务打印股票价格,但我无法弄清楚为什么它返回空字符串?
import urllib
import re
symbolslist = ["aapl","spy", "goog","nflx"]
i = 0
while i < len(symbolslist):
url = "http://finance.yahoo.com/q?s="+symbolslist[i]+"&q1=1"
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex = '<span id="yfs_l84_' + symbolslist[i] + '">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern,htmltext)
print price
i+=1
编辑:现在工作正常,这是一个语法错误。编辑了上面的代码。
答案 0 :(得分:1)
这些只是python开发(和抓取)的一些有用提示:
python requests库非常适合简化请求流程。
while
循环 for
循环非常有用。
symbolslist = ["aapl","spy", "goog","nflx"]
for symbol in symbolslist:
# Do logic here...
import requests
import lxml
url = "http://www.google.co.uk/finance?q="+symbol+"&q1=1"
r = requests.get(url)
xpath = '//your/xpath'
root = lxml.html.fromstring(r.content)
编译正则表达式需要时间和精力。你可以从循环中抽象出来。
regex = '<span id="yfs_l84_' + symbolslist[i] + '">(.+?)</span>'
pattern = re.compile(regex)
for symbol in symbolslist:
# do logic
正如drewk
的评论中提到的,Pandas和Matplot都有本机函数来获取雅虎报价,或者你可以使用ystockquote库从雅虎中删除。这样使用:
#!/bin/env python
import ystockquote
symbolslist = ["aapl","spy", "goog","nflx"]
for symbol in symbolslist:
print (ystockquote.get_price(symbol))