Question

在我的蜘蛛抓取所有网址之后，scrapy没有停止，如何在抓取完成后停止它？

起始网址为http://http://192.168.139.28/dvwa。

在我的蜘蛛完成之后，似乎蜘蛛总是Starting new HTTP connection (1): 192.168.139.28，而且我不知道如何让它自行停止，你能帮助我吗？

以下是输出信息：

 'retry/reason_count/504 Gateway Time-out': 2,
 'scheduler/dequeued': 82,
 'scheduler/dequeued/memory': 82,
 'scheduler/enqueued': 82,
 'scheduler/enqueued/memory': 82,
 'splash/execute/request_count': 40,
 'splash/execute/response_count/200': 38,
 'splash/execute/response_count/400': 1,
 'splash/execute/response_count/504': 3,
 'start_time': datetime.datetime(2018, 1, 10, 6, 36, 4, 298146)}
  2018-01-10 14:37:48 [scrapy.core.engine] INFO: Spider closed (finished)
  2018-01-10 14:38:41 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:38:41 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:39:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:39:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:40:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:40:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:41:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:41:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:42:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:42:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:43:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:43:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:44:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:44:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:45:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:45:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:46:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:46:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:47:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:47:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:48:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:48:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:49:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:49:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:50:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:50:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:51:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:51:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:52:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:52:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:53:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:53:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  ...

我正在使用带有scrapy的scrapy_splash，scrapy_splash服务器得到504错误，如here，然后我尝试通过docker run -it -p 8050:8050 scrapinghub/splash --max-timeout 3600启动scrapy_splash服务器，但它没有帮助，scrapy仍然Starting new HTTP connection (1): 192.168.139.28。

我的蜘蛛代码是：

from scrapy import cmdline
os.chdir("./crawler")
cmdline.execute('scrapy crawl exp10it'.split())

稍后当我尝试使用命令行：scrapy crawl exploit时，问题不会出现，scrapy在抓取完成后会停止正常，但我不知道为什么{{1}确实没有停止。

scrapy始终在爬网后启动新的HTTP连接

0 个答案: