Question

我之前尝试过这个。我完全不知所措。

在此页面上，此对话框可以引用引号。 http://www.schwab.com/public/schwab/non_navigable/marketing/email/get_quote.html？

我使用过SPY，XLV，IBM，MSFT

输出是上面的表格。

如果您有账户，报价是实时的 - 通过cookie。

如何使用2.6将表格导入python。列表或字典数据

Answer 1

使用Beautiful Soup之类的东西来解析来自网站的HTML响应并将其加载到字典中。使用符号作为键，以及您感兴趣的任何数据的元组作为值。迭代返回的所有符号，并为每个符号添加一个条目。

你可以在Toby Segaran的“编程集体智慧”中看到如何做到这一点的例子。样本都是Python。

Answer 2

第一个问题：数据实际上是帧中的iframe;您需要查看https://www.schwab.wallst.com/public/research/stocks/summary.asp?user_id=schwabpublic&symbol=APC（您在网址末尾替换相应的符号）。

第二个问题：从页面中提取数据。我个人喜欢lxml和xpath，但有很多软件包可以完成这项工作。我可能会期待一些像

这样的代码

import urllib2
import lxml.html
import re
re_dollars = '\$?\s*(\d+\.\d{2})'

def urlExtractData(url, defs):
    """
    Get html from url, parse according to defs, return as dictionary

    defs is a list of tuples ("name", "xpath", "regex", fn )
      name becomes the key in the returned dictionary
      xpath is used to extract a string from the page
      regex further processes the string (skipped if None)
      fn casts the string to the desired type (skipped if None)
    """

    page = urllib2.urlopen(url) # can modify this to include your cookies
    tree = lxml.html.parse(page)

    res = {}
    for name,path,reg,fn in defs:
        txt = tree.xpath(path)[0]

        if reg != None:
            match = re.search(reg,txt)
            txt = match.group(1)

        if fn != None:
            txt = fn(txt)

        res[name] = txt

    return res

def getStockData(code):
    url = 'https://www.schwab.wallst.com/public/research/stocks/summary.asp?user_id=schwabpublic&symbol=' + code
    defs = [
        ("stock_name", '//span[@class="header1"]/text()', None, str),
        ("stock_symbol", '//span[@class="header2"]/text()', None, str),
        ("last_price", '//span[@class="neu"]/text()', re_dollars, float)
        # etc
    ]
    return urlExtractData(url, defs)

当被称为

时

print repr(getStockData('MSFT'))

它返回

{'stock_name': 'Microsoft Corp', 'last_price': 25.690000000000001, 'stock_symbol': 'MSFT:NASDAQ'}

第三个问题：此页面上的标记是表示性的，而不是结构性的 - 它告诉我基于它的代码可能很脆弱，即对页面结构的任何更改（或页面之间的变化）都需要重新处理的XPath。

希望有所帮助！

Answer 3

你有没有想过使用雅虎的报价api？
见：http://developer.yahoo.com/yql/console/?q=show%20tables&env=store://datatables.org/alltableswithkeys#h=select%20 *％20from％20yahoo.finance.quotes％20 where％20symbol％20％3D％20％22YHOO％22

您将能够动态生成对网站的请求，例如：
http://query.yahooapis.com/v1/public/yql?q=select%20 *％20from％20yahoo.finance.quotes％20where％20symbol％20％3D％20％22YHOO％22安培;诊断=真安培; ENV =商店％3A％2F％2Fdatatables.org％2Falltableswithkeys

只需使用标准的http GET请求进行轮询即可。响应采用XML格式。

Answer 4

matplotlib有一个从Yahoo获取历史引用的模块：

>>> from matplotlib.finance import quotes_historical_yahoo
>>> from datetime import date
>>> from pprint import pprint
>>> pprint(quotes_historical_yahoo('IBM', date(2010, 11, 12), date(2010, 11, 18)))
[(734088.0,
  144.59,
  143.74000000000001,
  145.77000000000001,
  143.55000000000001,
  4731500.0),
 (734091.0,
  143.88999999999999,
  143.63999999999999,
  144.75,
  143.27000000000001,
  3827700.0),
 (734092.0,
  142.93000000000001,
  142.24000000000001,
  143.38,
  141.18000000000001,
  6342100.0),
 (734093.0,
  142.49000000000001,
  141.94999999999999,
  142.49000000000001,
  141.38999999999999,
  4785900.0)]

用python下载价格

4 个答案: