在python属性'contents'中使用美丽的汤

时间:2012-09-01 02:00:49

标签: python beautifulsoup

我正在使用bool“Hello!Python”中的以下代码:

import urllib2
from bs4 import BeautifulSoup
import os

def get_stock_html(ticker_name):
    opener = urllib2.build_opener(urllib2.HTTPRedirectHandler(),urllib2.HTTPHandler(debuglevel=0),)
    opener.addhaders = [('User-agent', "Mozilla/4.0 (compatible; MSIE 7.0; " "Windows NT 5.1; .NET CLR 2.0.50727; " ".NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)")]
    url = "http://finance.yahoo.com/q?s=" + ticker_name
    response = opener.open(url)
    return ''.join(response.readlines())

def find_quote_section(html):
    soup = BeautifulSoup(html)
    # quote = soup.find('div', attrs={'class': 'yfi_rt_quote_summary_rt_top'})
    quote = soup.find('div', attrs={'class': 'yfi_quote_summary'})
    return quote

def parse_stock_html(html, ticker_name):
    quote = find_quote_section(html)
    result = {}
    tick = ticker_name.lower()

    result['stock_name'] = quote.find('h2').contents[0]

if __name__ == '__main__':
    os.system("clear")
    html = get_stock_html('GOOG')
    # print find_quote_section(html)
    print parse_stock_html(html, 'GOOG')

收到以下错误:

Traceback (most recent call last):
  File "dwlod.py", line 33, in <module>
    print parse_stock_html(html, 'GOOG')
  File "dwlod.py", line 25, in parse_stock_html
    result['stock_name'] = quote.find('h2').contents[0]
AttributeError: 'NoneType' object has no attribute 'contents'

我是新手,并不知道该怎么做。这本书错了吗?

ADDED

我刚刚将result['stock_name'] = quote.find('h2').contents[0]替换为:

x = BeautifulSoup(html).find('h2').contents[0]
return x

现在,没有任何回复,但错误不再出现。那么,原始的python语法是否有问题?

1 个答案:

答案 0 :(得分:2)

虽然雅虎财务有一段时间没有真正改变他们的布局,但是自从该书发布以来,似乎他们可能稍微调整了一下,你需要的信息,如包含股票代码的h2信息可以是在yfi_rt_quote_summary yfi_quote_summary

顶部的容器中找到。def find_quote_section(html): soup = BeautifulSoup(html) quote = soup.find('div', attrs={'class': 'yfi_rt_quote_summary'}) return quote

result

另请注意,如果我们要打印一些明智的None,我们需要返回def parse_stock_html(html, ticker_name): quote = find_quote_section(html) result = {} tick = ticker_name.lower() result['stock_name'] = quote.find('h2').contents[0] return result >>> print parse_stock_html(html, 'GOOG') {'stock_name': u'Google Inc. (GOOG)'} >>>

find

顺便说一句,>>> help(BeautifulSoup(html).find) find(self, name=None, attrs={}, recursive=True, text=None, **kwargs) method of BeautifulSoup.BeautifulSoup instance Return only the first child of this Tag matching the given criteria. 只是找到第一场比赛。

BeautifulSoup

似乎是空的,findall也有>>> BeautifulSoup(html).findAll('h2')[3].contents[0] u'Google Inc. (GOOG)' ,它会返回所有匹配。

{{1}}

似乎第四个值是我们正在寻找的......但是,我确定你没有这样做,但请不要每次解析整个文档,这可能非常昂贵。