我一直试图构建一个Web Scraping脚本来监视网站html中的任何更改,因为它看到的更改比通过电子邮件和短信发送给我的要多。我遇到了一个问题,该脚本看不到任何更改,而是在60秒后重新启动。根本没有错误。如果我错过了代码中的某些内容,则idk不允许它搜索,只是继续前进并重新启动。
此处提供代码:
import time
print('>>> Time Imported')
time.sleep(1)
from bs4 import BeautifulSoup as soup
print('>>> BeautifulSoup Imported')
time.sleep(1)
import requests
print('>>> Requests Imported')
time.sleep(1)
import ssl
print('>>> SSL Imported')
time.sleep(1)
import smtplib
print('>>> smtplib Imported')
time.sleep(1)
from lxml import html
print('>>> LMXL and HTML Imported')
time.sleep(1)
from twilio.rest import Client
print('Twilio Imported')
time.sleep(1)
# End Imports
#start Script
while True:
url = 'http://A****.com'
print('>>> We have connected to ' +url)
time.sleep(1)
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
print('>>> Headers Initiating')
time.sleep(1)
page_response = requests.get(url, timeout=5)
print('>>> We got a response from ' +url)
time.sleep(1)
page_content = soup(page_response.content, "html.parser") # Takes 1 Min 48 Seconds to run
print('>>> Content Imported')
time.sleep(2)
print('>>> To prove i have connected, here is ' +url+ ' headers')
time.sleep(2)
print(' ')
print(page_content.title)
#tree = html.fromstring(page_response.content)
#price = tree.xpath('//span[@class="bid-price-val current-bid"]/text()')
#print(price)
time.sleep(2)
print(' ')
time.sleep(1)
print('>>> Initiating WebMonitor, If a change is found. That will be the next line')
time.sleep(7)
if str(soup).find('["330000"]') == -1:
time.sleep(60) #The script restarts here
#never sees the change
#Even tho there was one
continue
else:
print('>>> Theres been a change in '+url)
from twilio.rest import TwilioRestClient
accountSID = 'A*******'
authToken = 'a********'
twilioCli = TwilioRestClient(accountSID, authToken)
myTwilioNumber = '1******'
myCellPhone = '7*****'
message = client.messages.create(
body = "There has been a change at "+url,
from_= "+14955551234",
to = "7862199047",
)
print(message.sid)
msg = 'Subject: This is the script talking, Check '+url
fromaddr = 'r****'
toaddrs = ['m****','2','3']
server = smtplib.SMTP('smtp.gmail.com', 587)
server.starttls()
server.login("r****", 'r****')
print('From: ' + fromaddr)
print('To: ' + str(toaddrs))
print('Message: ' + msg)
server.sendmail(fromaddr, toaddrs, msg)
server.quit()
break
#def monitor():
答案 0 :(得分:0)
好像您的问题在此行中一样:
if str(soup).find('["330000"]') == -1:
您说str(soup)
时,是在尝试将Beautiful Soup类转换为字符串。那不会很好地工作。它只会创建类似"<class 'bs4.BeautifulSoup'>"
的字符串。在该字符串上使用汤的find()
方法将永远不会找到匹配项,因此无论是否进行任何更改,结果始终为-1。