我是Python和网络抓取的新手。请原谅我的无知。在此程序中,我想按计划运行我的Spider。我使用Python 3.7和MacO。 我使用crontab编写了cronjob,并调用了shell脚本来运行scrapy spider。但是,它仅在最后一行“ INFO:关闭蜘蛛(完成)”中执行了一次。没有按照计划重复。我执行了简单的python脚本来测试计划,然后就可以了。似乎只有蜘蛛有此问题。请帮助了解如何解决此问题。任何帮助,将不胜感激。谢谢
import csv
import os
import random
from time import sleep
import scrapy
class spider1(scrapy.Spider):
name = "amspider"
with open("data.csv", "a") as filee:
if os.stat("data.csv").st_size != 0:
filee.truncate(0)
filee.close()
def start_requests(self):
list = ["https://www.example.com/item1",
"https://www.example.com/item2",
"https://www.example.com/item3",
"https://www.example.com/item4",
"https://www.example.com/item5"
]
for i in list:
yield scrapy.Request(i, callback=self.parse)
sleep(random.randint(0, 5))
def parse(self, response):
product_name = response.css('#pd-h1-cartridge::text')[0].extract()
product_price = response.css(
'.product-price .is-current, .product-price_total .is-current, .product-price_total ins, .product-price ins').css(
'::text')[3].extract()
print(product_name)
print(product_price)
with open('data.csv', 'a') as file:
itemwriter = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
itemwriter.writerow([str(product_name).strip(), str(product_price).strip()])
file.close()
amsp.sh
#!/bin/sh
cd /Users/amal/PycharmProjects/AmProj2/amazonspider
PATH=$PATH:/usr/local/bin/
export PATH
scrapy crawl amspider
crontab
两种方法都尝试过,但是蜘蛛只执行了一次。
*/2 * * * * /Users/amal/Documents/amsp.sh
*/2 * * * * cd /Users/amal/PycharmProjects/AmProj2/amazonspider && scrapy crawl amspider