需要帮助!
对网站进行爬网并通过管道返回处理数据后,我需要通过电子邮件发送废弃的数据。我已经尝试并阅读了所有内容,但似乎无法说明问题。 在管道中,我尝试了以下操作:
class EmailPipeline(object):
def close_spider(self, spider):
from_email = "myemail@email.com"
to_email = "anotheremail@email.com"
msg = MIMEMultipart()
msg['From'] = from_email
msg['To'] = to_email
msg['Subject'] = 'Scrapper Results'
intro = "Summary stats from Scrapy spider: \n\n"
body = spider.crawler.stats.get_stats()
body = pprint.pformat(body)
body = intro + body
msg.attach(MIMEText(body, 'plain'))
server = smtplib.SMTP("mailserver", 465)
server.startssl()
server.login("user", "password")
text = msg.as_string()
server.sendmail(from_email, to_email, text)
server.quit()
我应该从管道或扩展程序发送电子邮件还是它的首选项?我将如何实施?
谢谢!
答案 0 :(得分:2)
Scrapy提供了MailSender
模块(基于smtplib
):
from scrapy.mail import MailSender
mailer = MailSender()
mailer.send(to=["someone@example.com"], subject="Some subject", body="Some body", cc=["another@example.com"])
答案 1 :(得分:0)
这是您可以使用并导入此send_mail函数的文件。您将需要进行一些更改以使其适合您的情况。您正在通过管道以正确的方式包含它。
import smtplib
# For guessing MIME type
import mimetypes
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.application import MIMEApplication
# Import the email modules we'll need
import email
def send_mail(filename):
sender = 'sender@email.com'
reciever = 'receiver@email.com'
marker = "AUNIQUEMARKER"
msg = MIMEMultipart()
msg['Subject'] = 'Subject text here'
msg['From'] = sender
msg['To'] = reciever
# Read a file and encode it into base64 format
fo = open(filename, "rb")
att = MIMEApplication(fo.read(),_subtype="pdf")
msg.attach(att)
fo.close()
try:
smtpObj = smtplib.SMTP(host='smtp.host.com', port=587)
smtpObj.ehlo()
smtpObj.starttls()
smtpObj.login(sender, 'your password')
smtpObj.sendmail(sender, reciever, msg.as_string())
print('SUCCESSFULLY SENT EMAIL')
return
except Exception as e:
print("SEND E-MAIL FAILED WITH EXCEPTION: {}".format(e))
return
另一件在输出目录中找到最后修改的文件
import os
import glob
download_dir = "/full/path/to/files/"
def get_newest_file():
print("Finding latest pdf file")
file_list = glob.glob('{}*.pdf'.format(download_dir))
latest_file = max(file_list, key=os.path.getctime)
if latest_file:
print("Latest file: {}".format(latest_file))
return latest_file