通过官方tut工作后,我决定尝试在同一个项目中构建自己的蜘蛛。我在spiders目录中创建了parker_spider.py。其中包含:
start_urls = [
"myurl"
]
class Parker_Spider(scrapy.Spider):
name = "parker"
def start_requests(self):
for i in range(self.max_id):
yield Request('myurl', method="post", headers= headers, body=payload, callback=self.parse_method)
def parse_method(self, response):
j = json.loads(response.body_as_unicode())
print(j['d'][0])
我可以看到正确的输出在蜘蛛运行时打印出来,所以我知道它的工作原理。现在我想将输出存储为JSON。我跑的时候:
$ scrapy crawl parker -o items.json
............
2016-05-31 16:53:55 [scrapy] INFO: Closing spider (finished)
2016-05-31 16:53:55 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16112,
'downloader/request_count': 26,
'downloader/request_method_count/POST': 26,
'downloader/response_bytes': 12484,
'downloader/response_count': 26,
'downloader/response_status_count/200': 26,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 5, 31, 20, 53, 55, 192000),
'log_count/DEBUG': 27,
'log_count/INFO': 7,
'response_received_count': 26,
'scheduler/dequeued': 26,
'scheduler/dequeued/memory': 26,
'scheduler/enqueued': 26,
'scheduler/enqueued/memory': 26,
'start_time': datetime.datetime(2016, 5, 31, 20, 53, 54, 31000)}
2016-05-31 16:53:55 [scrapy] INFO: Spider closed (finished)
在项目目录中创建了items.json,但它是空的。我做错了什么?
编辑:更改蜘蛛代码如下:
def parse_method(self, response):
j = json.loads(response.body_as_unicode())
ParkerItem.account=j['d'][0]
print(j['d'][0])
return ParkerItem.account
items.py:
class ParkerItem(scrapy.Item):
account = scrapy.Field()
现在,当我运行它时,我得到了:
ERROR: Spider must return Request, BaseItem, dict or None, got 'unicode' in <POST myurl
现在是什么?
答案 0 :(得分:1)
您的parse_method
需要返回scrapy.item.Item
或子类的实例。实际上,它返回None
,Scrapy解释为没有从收到的响应中提取任何项目。
答案 1 :(得分:1)
def parse_method(self, response):
j = json.loads(response.body_as_unicode())
item = ParkerItem()
item['account'] = j['d'][0]
yield item