循环抓取脚本

时间:2019-01-08 14:47:00

标签: python scrapy

方案:我有一个抓取脚本,用于抓取该网站。在抓取的详细信息中找到所需的关键字后,它将发送邮件。有一个站点每30分钟更改一次数据,我需要在指定关键字后再次抓取并发送电子邮件(如果找到)。我该如何每隔30分钟在scrapy python中循环。

代码:

# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import Request
import smtplib
from email.mime.text import MIMEText
import time

class NewFilmSpiderSpider(scrapy.Spider):
    name = 'new_film_spider'
    allowed_domains = ['www.xxx.in']
    start_urls = ['https://www.xxx.in/xxx/now-showing']

    def parse(self, response):
        t = threading.Thread(self.getDetails(response))
        t.start()

    def getDetails(self, response):
        FROM_ADDRESS = 'xxx@gmail.com'
        PASSWORD = 'xxx'
        TO_ADDRESS= 'xxx@gmail.com'
        HOST='smtp.gmail.com'
        PORT=587
        records = response.xpath('//section[@class="main-section"]/section[2]/section[@class="movie__listing now-showing"]/ul/li/div/dl/dt/a/text()').extract()
        if 'KEYWORD' in str(records):
            receivers = [TO_ADDRESS]
            msg="Booking Opened"
            try:
                smtpObj = smtplib.SMTP(HOST,PORT)
                smtpObj.set_debuglevel(1)
                smtpObj.ehlo()
                smtpObj.starttls()
                smtpObj.login(FROM_ADDRESS,PASSWORD)
                smtpObj.sendmail(FROM_ADDRESS, receivers, msg)   
                smtpObj.quit()      
                print "Successfully sent email"
            except Exception as e:
                print "Error: unable to send email"
         time.sleep(60) #checking for every minute

此代码运行脚本并发送邮件。我不知道该如何循环播放。任何线索都将有所帮助。谢谢。

更新#1: 我尝试了穿线。如答案中所给。但程序会在两个循环后停止。

更新#2: 我忘了添加While。有效

1 个答案:

答案 0 :(得分:2)

您可以生成一个每30分钟运行一次的线程,如下所示:

import threading

def __init__(self):
    . . .    
    t = threading.Thread(self.every_thirty_min())
    t.start()

def every_thirty_min(self):
    while True:
        print('up')
        // do stuff
        time.sleep(1800) // 30 min