很抱歉,如果我的问题看起来太模糊了。我是这个网站和一般论坛的新手。
基本上,我为一个销售鞋类的网页制作了一个Scrapy蜘蛛。跑了,得到了我期待的结果,即特定鞋子的可用尺码。
源代码(页面源)更改了10分钟。后来,鞋子尺码缺货。
再次爬上蜘蛛,但输出和以前一样。为什么?
这有意义吗?请告诉我。非常感谢任何回复。
PS。在我的设置中,我只有BOT_NAME,SPIDER_MODULES和NEWSPIDER_MODULE
Spider Code如下:
from scrapy.spider import Spider
from scrapy.selector import HtmlXPathSelector
from tutorial.items import DropItem
class DropSpider(Spider):
name = "Drop"
allowed_domains = [domain name]
start_urls = [ list of url's
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
info = hxs.select('//html')
items = []
for unit in info:
item = DropItem()
item['title'] = unit.select('head/title/text()').extract()
item['colour01_sizes']= unit.select('body/div/div/div/div/div/div/div/form/div/div/div/dl/dd/select/option[2]/text() | body/div/div/div/div/div/div/div/form/div/div/div[2]/dl/dd/ul/li/span/label/text()').extract()
item['colour02_sizes']= unit.select('body/div/div/div/div/div/div/div/form/div/div/div/dl/dd/select/option[3]/text() | body/div/div/div/div/div/div/div/form/div/div/div[3]/dl/dd/ul/li/span/label/text()').extract()
item['colour03_sizes']= unit.select('body/div/div/div/div/div/div/div/form/div/div/div/dl/dd/select/option[4]/text() | body/div/div/div/div/div/div/div/form/div/div/div[4]/dl/dd/ul/li/span/label/text()').extract()
item['colour04_sizes']= unit.select('body/div/div/div/div/div/div/div/form/div/div/div/dl/dd/select/option[5]/text() | body/div/div/div/div/div/div/div/form/div/div/div[5]/dl/dd/ul/li/span/label/text()').extract()
items.append(item)
return items
运行spider的命令: scrapy crawl Drop