我使用rabbitMQ作为我的蜘蛛,每只蜘蛛都将数据发送到接收器。
我使用此命令启动receiver.py
作为守护程序:daemon python /receiver.py
当我启动多个蜘蛛实例时,我觉得队列"已过期" 有更多receiver.py
个实例。
我的代码出了什么问题?
发件人的工作原理如下(Scrapy spider):
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='expired')
def parse_items(self, response):
for link in LxmlLinkExtractor(allow=(), deny=self.allowed_domains, canonicalize=False).extract_links(response):
# self.logger.info('[{}] Added to the List'.format(domain))
channel.basic_publish(exchange='', routing_key='expired', body=domain, )
self.domains.append(domain)
接收者执行此操作:
class Threaded_worker(threading.Thread):
def callback(self, ch, method, properties, domain):
url = 'http://www.checkdomain.com/cgi-bin/checkdomain.pl?domain=' + domain
self.parse_checkdomain(url, domain)
time.sleep(domain.count('.'))
ch.basic_ack(delivery_tag=method.delivery_tag)
def __init__(self):
threading.Thread.__init__(self)
self.connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
self.channel = self.connection.channel()
self.channel.queue_declare(queue='expired')
self.channel.basic_qos(prefetch_count=1)
self.channel.basic_consume(self.callback, queue='expired')
def run(self):
logging.warning('Worker Start !')
self.channel.start_consuming()
for _ in range(15):
td = Threaded_worker()
td.setDaemon(False)
td.start()
顺便说一下,我有一个小问题,如果我的receiver.py
没有启动,所有数据仍然保存在队列中?