美丽的汤不能处理超过2700行的源代码?

时间:2014-02-09 10:49:41

标签: python beautifulsoup urllib2

我目前正试图取消ATP(网球协会)网站,我遇到了一个我无法解决的问题。

当我尝试废弃位于第2700行之后的行时,我收到错误。

有没有办法解决这个问题?

这是我的代码(此代码适用于前面的代码):

# -*- coding: utf-8 -*-

from bs4 import BeautifulSoup
from urllib2 import urlopen
import sys

BASE_URL = "http://www.atpworldtour.com/Share/Event-Draws.aspx?e=540&y=2012"

def make_soup(url):
    html = urlopen(url).read()
    return BeautifulSoup(html, "lxml")

def get_player_name_third_round_winner(section_url):
    soup = make_soup(section_url)
    colonne4 = soup.find("td", "col_4")
    playerWrap = colonne4.findAll("div", "playerWrap")
    for name in playerWrap:
        print name.find("a").string

def get_player_score_third_round_winner(section_url):
    soup = make_soup(section_url)
    colonne4 = soup.find("td", "col_4")
    scores = colonne4.findAll("div", "scores")
    for score in scores:
        print score.find("a").string

get_player_name_third_round_winner(BASE_URL)
get_player_score_third_round_winner(BASE_URL)

以下是显示的错误:

Traceback (most recent call last):
  File "/Users/Me/Desktop/ATP/atp_col4", line 27, in <module>
    get_player_name_third_round_winner(BASE_URL)
  File "/Users/Me/Desktop/ATP/atp_col4", line 16, in get_player_name_third_round_winner
    playerWrap = colonne4.findAll("div", "playerWrap")
AttributeError: 'NoneType' object has no attribute 'findAll'
[Finished in 1.6s with exit code 1]

1 个答案:

答案 0 :(得分:0)

好吧,我和你有同样的错误。但我会把它打印成结果。我不知道这是否是最好的解决方案,但至少它是一个。

我将代码更改为此代码:

def make_soup(url):
   html = urlopen(url).read()
   return BeautifulSoup(html, "html.parser")

然后我也包括这部分:

import sys
sys.setrecursionlimit(30000)