我有两种物品:
ES3
我想使用tsconfig.json
和class MovieItem(scrapy.Item):
id = scrapy.Field()
image_urls=scrapy.Field()
image_paths =scrapy.Field()
torrents = scrapy.Field()
#...other fields
class TorrentItem(scrapy.Item):
id = scrapy.Field()
movie_id = scrapy.Field()
image_urls=scrapy.Field()
image_paths =scrapy.Field()
在电影中下载图片和种子。我应该如何在ImagePipeline
方法中产生两个项目?我该如何定义相应的管道呢?
答案 0 :(得分:1)
答案是肯定的,你可以。这是一个如何做到这一点的例子。这是一只example.py
蜘蛛:
# -*- coding: utf-8 -*-
import scrapy
class MovieItem(scrapy.Item):
id = scrapy.Field()
image_urls=scrapy.Field()
images =scrapy.Field()
torrents = scrapy.Field()
itemtype = scrapy.Field()
class TorrentItem(scrapy.Item):
id = scrapy.Field()
movie_id = scrapy.Field()
image_urls=scrapy.Field()
images =scrapy.Field()
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = (
'http://www.example.com/',
)
def parse(self, response):
image_urls = [
"http://...-Miles.jpg",
"https:/.../58832_300x300",
"http://...-Circuit-Tests.png"
]
torent_ids = []
for i in xrange(3):
t = TorrentItem()
t["id"] = "#id%d" % i
t["movie_id"] = 143
t["image_urls"] = [image_urls[i]]
# ...
torent_ids.append(t["id"])
yield t
m = MovieItem()
m['id'] = 143
m['image_urls'] = ['http://...test.png']
m['torrents'] = torent_ids
m['itemtype'] = ['movie']
# ...
yield m
在settings.py
上添加以下两行:
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = '.'
运行蜘蛛:
scrapy crawl example -o test.jl
您的test.jl
文件将包含(经过一些格式化):
{
"images": [
{
"url": "http://.../Stuart-Miles.jpg",
"path": "full/27c8d5099f8785e8fbc2370249a0260e216ee2dd.jpg",
"checksum": "dba2fc121610b328448dc37084f31dac"
}
],
"movie_id": 143,
"id": "#id0",
"image_urls": [
"http://...ter-Key-by-Stuart-Miles.jpg"
]
}
{
"images": [
{
"url": "https://i....t/58832_300x300",
"path": "full/b11276eb5b64b5ec7f40eedf4c6fcc6d6d9072ac.jpg",
"checksum": "a9b47ecbb2de9dcb6a61a159120f1bd2"
}
],
"movie_id": 143,
"id": "#id1",
"image_urls": [
"https://i.vi..._300x300"
]
}
{
"images": [
{
"url": "http://www.ej...rt-Circuit-Tests.png",
"path": "full/a68282eb533d35a0aa8732a872277933db8951c5.jpg",
"checksum": "24c0907e3ef610dc355e930f2535c0c4"
}
],
"movie_id": 143,
"id": "#id2",
"image_urls": [
"http://www.ejob...nsformer-Open-and-Short-Circuit-Tests.png"
]
}
{
"images": [
{
"url": "http://...est.png",
"path": "full/1e3e0f775cd40aaa5ea081278957f4d49e39f610.jpg",
"checksum": "50a57a6263b9640ee47e913deadaff7c"
}
]
"torrents": [
"#id0",
"#id1",
"#id2"
],
"itemtype": [
"movie"
],
"image_urls": [
"http://xi.../10/test.png"
],
"id": 143
}
这适用于.jl
个文件作为输出。它与.csv
不兼容,但在您的情况下这不应该是一个问题。