我编写了一个脚本,在发布日期文章时抓取http://sf.eater.com/the-shutter,然后通过电子邮件向我发送最新文章的日期。虽然这很棒(特别是我让它工作!),理想的情况是只能在发布以前看不见的(即新的)文章时发送电子邮件。
这是我写的:
# Import requests (to download the page)
import requests
# Import BeautifulSoup (to parse what we download)
from bs4 import BeautifulSoup
# Import Time (to add a delay between the times the scape runs)
import time
# Import smtplib (to allow us to email)
import smtplib
# Import regular expressions
import re
import urllib2
import sys
from email.MIMEMultipart import MIMEMultipart
from email.MIMEText import MIMEText
#-----------------------------------------------------------
#scrape the page
url = "http://sf.eater.com/the-shutter"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = requests.get(url, headers=headers)
#parse the HTML
soup = BeautifulSoup(response.text, "html.parser")
#create an empty list
dates = []
#iterate through the parsed HTML and extract dates
for span_tag in soup.find_all("span"):
if 'am' in span_tag.text:
dates.append(span_tag.text)
if 'pm' in span_tag.text:
dates.append(span_tag.text)
# email the results
#sending addres
fromaddr = "<insert from add here>"
#to address
toaddr = "<insert to add here>"
msg = MIMEMultipart()
msg['From'] = fromaddr
msg['To'] = toaddr
#subject of the email
msg['Subject'] = "Shutter Article Check"
#body of the email
body = "The most recent Shutter article page for SF Eater was posted on" + dates[0] + "Here is a link: http://sf.eater.com/the-shutter"
msg.attach(MIMEText(body, 'plain'))
server = smtplib.SMTP('smtp.gmail.com', 587)
server.starttls()
server.login(fromaddr, "<insert password here>") # username/password
text = msg.as_string()
server.sendmail(fromaddr, toaddr, text)
server.quit()
基本上检查每个解析后的字符串是否为am / pm,如果是,则将它们放入列表中,然后将列表的第一项通过电子邮件发送到我选择的电子邮件地址。
如果在添加新日期时,如果变量在脚本运行之间没有保留其值,我怎么才能发送电子邮件?