循环通过Urls并且显示数据的美丽的汤

时间:2016-01-22 12:55:46

标签: python web web-scraping beautifulsoup

我正在用美丽的汤来刮这个网址 http://www.gbgb.org.uk/resultsRace.aspx?id=1839041并且它可以根据需要显示所有字段。但是它只在夹具结果卡上显示一个比赛,我想提取整个比赛会议,在卡上9到14场比赛之间变化以下是整个会议的网址http://www.gbgb.org.uk/resultsMeeting.aspx?id=135488。 有没有什么方法可以循环,完整的比赛卡并显示卡上所有种族的内容。下面是一个种族的代码。

 from urllib import urlopen

from bs4 import BeautifulSoup
html = urlopen("http://www.gbgb.org.uk/resultsRace.aspx?id=1839041")

bsObj = BeautifulSoup(html)
nameList = bsObj. findAll("div", {"class": "track"})
for name in nameList:
 print(name. get_text())

nameList = bsObj. findAll("div", {"class": "date"})
for name in nameList:
 print(name. get_text())

nameList = bsObj. findAll("div", {"class": "datetime"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("div", {"class": "grade"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("div", {"class": "distance"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("div", {"class": "prizes"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("li", {"class": "first essential fin"}) 
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("li", {"class": "essential greyhound"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("li", {"class": "trap"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("li", {"class": "sp"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("li", {"class": "timeSec"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("li", {"class": "timeDistance"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("li", {"class": "essential trainer"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("li", {"class": "first essential comment"})
for name in nameList:
 print(name. get_text())
nameList = bsObj. findAll("div", {"class": "resultsBlockFooter"})
for name in nameList:
 print(name. get_text())

1 个答案:

答案 0 :(得分:0)

您只需迭代结果块。略有不同的标签,但基本上是相同的东西。我在Chrome中使用了inspect元素功能,使HTML抓取变得容易。

from urllib import urlopen

from bs4 import BeautifulSoup
baseURL = 'http://www.gbgb.org.uk/resultsMeeting.aspx?id=135488'
html = urlopen(baseURL)
bsObj = BeautifulSoup(html, 'lxml')
nameList = bsObj.findAll("div", {"class": "resultsBlock"})
for i in nameList:
    # just the trap info, the rest is similar
    nameList2 = i.findAll("li", {"class": "trap"})
    for j in nameList2:
        print(j.get_text())