我正在尝试执行以下操作:
我试图从另一个python脚本which resides in a different directory
调用这些蜘蛛。现在正确调用第一个蜘蛛没有任何问题。
问题在于第二只蜘蛛。
第二个蜘蛛的源代码如下:
import scrapy
from dateutil.parser import parse
import requests
from scrapy.http import Request
from project-name.items import Project-nameItem
url_list = []
with open("file.txt", "r") as f:
for line in f:
url_list.append(line)
for i in range(0, len(url_list)):
url_list[i] = url_list[i].replace('\n','')
indexList = []
URL = "http://www.exaple.com/id=%s"
number = 0
class AnotherSpider(scrapy.Spider):
name = "another"
allowed_domains = ['example.com']
start_urls = [URL % number]
def start_requests(self):
for i in url_list:
yield Request(url = URL % i, callback = self.parse)
def parse(self, response):
#scrape the page for the required information
当我拨打第二个蜘蛛时,我得到的错误是:
runspider: error: Unable to load '/home/project-name/project-name/spiders/anotherspider.py': No module named project-name.items
修改
由于python脚本位于不同的目录中,因此我使用runspider
命令来执行spiders。此命令的问题在于它是全局级别命令,这意味着不考虑项目设置。这很可能导致python脚本无法识别items.py
文件
用于执行蜘蛛的命令如下:
scrapy runspider spider1.py
scrapy runspider spider2.py
有解决方法吗?