使用扩展程序发送包含已删除数据的电子邮件

时间:2019-05-14 17:16:16

标签: scrapy

需要帮助!

对网站进行爬网并通过管道返回处理数据后,我需要通过电子邮件发送废弃的数据。我已经尝试并阅读了所有内容,但似乎无法说明问题。 在管道中,我尝试了以下操作:

class EmailPipeline(object):
    def close_spider(self, spider):
        from_email = "myemail@email.com"
        to_email = "anotheremail@email.com"

        msg = MIMEMultipart()
        msg['From'] = from_email
        msg['To'] = to_email
        msg['Subject'] = 'Scrapper Results'

        intro = "Summary stats from Scrapy spider: \n\n"

        body = spider.crawler.stats.get_stats()
        body = pprint.pformat(body)
        body = intro + body
        msg.attach(MIMEText(body, 'plain'))

        server = smtplib.SMTP("mailserver", 465)
        server.startssl()
        server.login("user", "password")
        text = msg.as_string()
        server.sendmail(from_email, to_email, text)
        server.quit()

我应该从管道或扩展程序发送电子邮件还是它的首选项?我将如何实施?

谢谢!

2 个答案:

答案 0 :(得分:2)

Scrapy提供了MailSender模块(基于smtplib):

from scrapy.mail import MailSender
mailer = MailSender()
mailer.send(to=["someone@example.com"], subject="Some subject", body="Some body", cc=["another@example.com"])

答案 1 :(得分:0)

这是您可以使用并导入此send_mail函数的文件。您将需要进行一些更改以使其适合您的情况。您正在通过管道以正确的方式包含它。

import smtplib

# For guessing MIME type
import mimetypes
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.application import MIMEApplication
# Import the email modules we'll need
import email

def send_mail(filename):
    sender = 'sender@email.com'
    reciever = 'receiver@email.com'
    marker = "AUNIQUEMARKER"
    msg = MIMEMultipart()
    msg['Subject'] = 'Subject text here'
    msg['From'] = sender
    msg['To'] = reciever
    # Read a file and encode it into base64 format
    fo = open(filename, "rb")
    att = MIMEApplication(fo.read(),_subtype="pdf")
    msg.attach(att)
    fo.close()
    try:
        smtpObj = smtplib.SMTP(host='smtp.host.com', port=587)
        smtpObj.ehlo()
        smtpObj.starttls()
        smtpObj.login(sender, 'your password')
        smtpObj.sendmail(sender, reciever, msg.as_string())
        print('SUCCESSFULLY SENT EMAIL')
        return
    except Exception as e:
        print("SEND E-MAIL FAILED WITH EXCEPTION: {}".format(e))
        return

另一件在输出目录中找到最后修改的文件

import os
import glob

download_dir = "/full/path/to/files/"

def get_newest_file():
    print("Finding latest pdf file")
    file_list = glob.glob('{}*.pdf'.format(download_dir))
    latest_file = max(file_list, key=os.path.getctime)
    if latest_file:
        print("Latest file: {}".format(latest_file))
        return latest_file