BS4中的迭代在Web刮擦中失败

时间:2016-01-24 11:26:41

标签: python web-scraping bs4

我正在使用漂亮的汤4从全面的赛狗会议(英国)中搜集数据。这是一个网址的例子。 http://www.gbgb.org.uk/resultsMeeting.aspx?id=135549  每次会议通常有9到14场比赛。下面的代码遍历卡上的每个比赛(事件)并将数据打印到屏幕(PyCharm Python v3)。问题是BS没有完成迭代并且通常会失败卡片上的比赛(赛事)7或8左右,并且在某些情况下,只通过获得一半赛跑者的数据而在比赛的一半时间内分解。在某些情况下,我得到标准信息“处理完成退出代码0“我确实认为它可能与暂时不可用的Url有关,但是程序似乎总是默认在第7或第8场比赛事件中是不寻常的。我已经搜索了各种页面的源代码和可以看到该代码没有任何不一致(承认我不太熟悉HTML)任何建议表示赞赏。

 from urllib import urlopen
 from bs4 import BeautifulSoup
 baseURL = 'http://www.gbgb.org.uk/resultsMeeting.aspx?id=135549'
 html = urlopen(baseURL)
 bsObj = BeautifulSoup(html, 'lxml')

 nameList = bsObj.findAll("div", {"class": "resultsBlockHeader"})
 for i in nameList:


     nameList1 = i.findAll("div", {"class": "track"})
     for j in nameList1:
         print(j.get_text())

     nameList1 = i.findAll("div", {"class": "date"})
     for j in nameList1:
         print(j.get_text())

     nameList1 = i.findAll("div", {"class": "datetime"})
     for j in nameList1:
         print(j.get_text())

     nameList1 = i.findAll("div", {"class": "grade"})
     for j in nameList1:
        print(j.get_text())

    nameList1 = i.findAll("div", {"class": "distance"})
    for j in nameList1:
        print(j.get_text())

    nameList1 = i.findAll("div", {"class": "prizes"})
    for j in nameList1:
        print(j.get_text())

nameList = bsObj.findAll("div", {"class": "resultsBlock"})
for i in nameList:

    nameList2 = i.findAll("li", {"class": "trap"})
    for j in nameList2:
        print(j.get_text())

    nameList2 = i.findAll("li", {"class": "first essential fin"})
    for j in nameList2:
        print(j.get_text())

    nameList2 = i.findAll("li", {"class": "essential greyhound"})
    for j in nameList2:
        print(j.get_text())

    nameList2 = i.findAll("li", {"class": "sp"})
    for j in nameList2:
        print(j.get_text())

    nameList2 = i.findAll("li", {"class": "timeSec"})
    for j in nameList2:
        print(j.get_text())

    nameList2 = i.findAll("li", {"class": "timeDistance"})
    for j in nameList2:
        print(j.get_text())

0 个答案:

没有答案