我正在使用Scrapy来解析和html页面并从页面中提取某些值。我似乎陷入了我的代码的最后和最后一部分,我试图拉取该项目的价格的价值。我正在查看的具体页面是http://www.rogerssportinggoods.com/decoys-duck-decoys-c-16_71.html。 这是我正在使用的部分,包括119.99美元的价格。
<td align="center" class="productListing-data" width="25%">
<a href="http://www.rogerssportinggoods.com/dakota-decoys-dakota-decoy-full-body
-mallard-decoys-6pack-p-3036.html"><img src="images/DAK-12160-125.png" border="0"
alt="Dakota Decoy Full Body Mallard Decoys, 6-Pack" title=" Dakota Decoy Full
Body Mallard Decoys, 6-Pack "></a> <br> <a href="http://www.rogerssport
inggoods.com/dakota-decoys-dakota-decoy-full-body-mallard-decoys-6pack-p-3036.html
">Dakota Decoy Full Body Mallard Decoys, 6-Pack</a> <br> DAK-12160
<br> <a href="http://www.rogerssportinggoods.com/dakota-decoys-m-200.html">
Dakota Decoys</a> <br> $119.99 <br>
<a href="http://www.rogerssportinggoods.com/-s-.html"></a> </td>
以下是我目前用于此项目的代码:
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import HtmlXPathSelector
from test_scraper.items import TestScraperItem
class MySpider(CrawlSpider):
name = "test"
allowed_domains = ["rogerssportinggoods.com"]
start_urls = ["http://www.rogerssportinggoods.com/decoys-duck-decoys-c-16_71.html"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//html/body/div/div/table/tr/td/table/tr/td/table/tr/td")
for titles in titles:
title = titles.select("a/text()").extract()
link = titles.select("a/@href").extract()
price = titles.select("/a[text()='${nbsp}']").extract()
print title, link, price
答案 0 :(得分:0)
试试这个:
from scrapy.contrib.spiders import CrawlSpider
from scrapy.selector import HtmlXPathSelector
class MySpider(CrawlSpider):
name = "rogers"
allowed_domains = ["rogerssportinggoods.com"]
start_urls = ["http://www.rogerssportinggoods.com/decoys-duck-decoys-c-16_71.html"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
for box in hxs.select('descendant::table[@class="productListing"]//td'):
title = box.select("a[2]/text()").extract()
link = box.select("a[3]/@href").extract()
price = [p.strip() for p in box.select("a[3]/following-sibling::node()").extract() if '$' in p]
print title, link, price
收率:
[u'Dakota Decoy Full Body Mallard Decoys, 6-Pack']
[u'http://www.rogerssportinggoods.com/dakota-decoys-m-200.html']
[u'$119.99']