之前我已经写过scrapy代码并且按预期工作了,但不知怎的,在这个特定的1中我的逻辑不起作用,请帮助我:
我的items.py:
import scrapy
class KohlItem(scrapy.Item):
images=scrapy.Field()
links=scrapy.Field()
name=scrapy.Field()
title=scrapy.Field()
我的first.py:
import scrapy
from scrapy.spiders import CrawlSpider,Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.loader import ItemLoader
from kohl.items import KohlItem
class FirstSpider(scrapy.Spider):
name = "first"
allowed_domains = ["kohls.com"]
start_urls = [
"http://www.kohls.com/catalog/mens-t-shirts-tops-tees-tops-clothing.jsp?CN=Gender:Mens+Silhouette:T-Shirts+Product:Tops%20%26%20Tees+Category:Tops+Department:Clothing&cc=mens-TN3.0-S-tshirts",
"http://www.kohls.com/catalog/mens-t-shirts-tops-tees-tops-clothing.jsp?CN=Gender:Mens+Silhouette:T-Shirts+Product:Tops%20%26%20Tees+Category:Tops+Department:Clothing&cc=mens-TN3.0-S-tshirts&PPP=60&WS=240"]
def parse(self, response):
#print response.url
kk=ItemLoader(item=KohlItem(),response=response)
kk.add_xpath('title','//title/text()')
kk.add_value('links',response.url)
kk.add_xpath('images','//*[@class="pmp-hero-img"]/@src')
print 'images ******************'
kk.add_xpath('name','//*[@class="prod_nameBlock"]/p/text()')
print '******************'
return kk.load_item()
输出正在按预期给我标题,但其余值不会出现。
上面使用的xpath在浏览器控制台中工作正常,并按预期为我提供输出。
答案 0 :(得分:0)
我要检查的第一件事是,是否使用javascript呈现页面。如果是,那么webdriver将有所帮助。