我开始接受scrapy治疗。我的items.py包含:
class ParkerItem(scrapy.Item):
account = scrapy.Field()
m = scrapy.Field()
然后我生成了对网站的请求:
for i in range(max_id):
yield Request('first_url', method="post", headers= headers, body=payload, callback=self.parse_get_account)
def parse_get_account(self, response):
j = json.loads(response.body_as_unicode())
if j['d'][0] != "":
item = ParkerItem()
item['account'] = j['d'][0]
return self.parse_second_request(item)
print("back here"+str(item))
print "hello"
如果存在帐号,我将其存储在item中并调用parse_second_request
def parse_second_request(self, item):
yield Request(method="GET", url=(url + '?' + urllib.urlencode(querystring)), headers=headers, callback=self.parse_third_request,meta={'item': item})
这会调用parse_third_request(它实际解析第二个)
def parse_third_request(self, response):
item = response.meta['item'] # {'account': u'11'}
m = response.selector.xpath('/table//td[3]/text()').extract()
item["m"] = m[0]
print("hi"+str(item))
return item
此代码有效,并且该项目被传递到管道进行存储,但似乎它们的功能很多,只能抓取2个页面。有没有办法使用最佳实践来简化代码?
答案 0 :(得分:2)
您可以避免使用DELIMITER $$
CREATE TRIGGER regi_tg AFTER INSERT ON registros FOR EACH ROW
BEGIN
SET OLD.secuencia = NEW.id;
END;
CREATE TRIGGER regi_tg AFTER UPDATE ON registros FOR EACH ROW
BEGIN
SET OLD.secuencia = NEW.id;
END;
END$$
DELIMITER ;
方法:
def parse_second_request(self, item):
除此之外,由于您的商品字段是从来自不同页面的数据填充的,因此您正在正确地执行此操作。