xpath只从站点中提取一些数据

时间:2016-01-21 17:19:50

标签: python python-3.x xpath web web-scraping

我正在使用xpath和python尝试从代码中的站点获取数据。我已经设法下载了大部分数据(经过一段时间)但我无法提取Greyhound数据字段和Dogdetail Greyhound数据实际上是一个标签href路径,在尝试xpath上的各种变化之后我仍然无法获得数据。整体计划是下载赛狗结果,进入数据库(或电子表格)任何帮助表示赞赏。

 from lxml import html
 import requests


 page = requests.get('http://www.gbgb.org.uk/resultsRace.aspx?id=1838526')
 tree = html.fromstring(page.content)

 track=tree.xpath('//div[@class="track"]/text() ')
 print 'Track',track

 date=tree.xpath('//div[@class="date"]/text() ')
 print 'date',date

 datetime=tree.xpath('//div[@class="datetime"]/text() ')
 print 'datetime', datetime

 essentialgreyhound=tree.xpath('//a[@href="essential greyhound"]/text() ')
 print 'Greyhound', essentialgreyhound

 firstessentialfin= tree.xpath('//li[@class="first essential fin"]//text()')
 print 'Position:', firstessentialfin
 sp= tree.xpath('//li[@class="sp"]/text() ')
 print 'StartingPrice:', sp
 trap= tree.xpath('//li[@class="trap"]/text() ')
 print 'Trap:', trap
 trainer= tree.xpath('//li[@class="essential trainer"]/text() ')
 print 'Trainer:', trainer
 timeSec=tree.xpath('//li[@class="timeSec"]/text() ')
 print 'TimeSec',timeSec
 timeDistance=tree.xpath('//li[@class="timeDistance"]/text() ')
 print 'TimeDistance',timeDistance

 firstessentialcomment=tree.xpath('//li[@class="first essential comment"]/text() ')
 print 'Comment',firstessentialcomment
 firstessential=tree.xpath('//li[@class="first essential"]/text()')
 print 'DogDetail', firstessential

1 个答案:

答案 0 :(得分:0)

您应修复Greyhound列的XPath:

//li[@class="essential greyhound"]/a/text()

给我打印:

Greyhound ['Ultimate Bundle', 'Powerfast Raven', 'Upagumtree', 'Buglys Causeway', 'Group Vespa', 'Winword Jacko']