Question

我一直试图构建一个Web Scraping脚本来监视网站html中的任何更改，因为它看到的更改比通过电子邮件和短信发送给我的要多。我遇到了一个问题，该脚本看不到任何更改，而是在60秒后重新启动。根本没有错误。如果我错过了代码中的某些内容，则idk不允许它搜索，只是继续前进并重新启动。

此处提供代码：

import time
print('>>> Time Imported')
time.sleep(1)
from bs4 import BeautifulSoup as soup
print('>>> BeautifulSoup Imported')
time.sleep(1)
import requests
print('>>> Requests Imported')
time.sleep(1)
import ssl
print('>>> SSL Imported')
time.sleep(1)
import smtplib
print('>>> smtplib Imported')
time.sleep(1)
from lxml import html
print('>>> LMXL and HTML Imported')
time.sleep(1)
from twilio.rest import Client
print('Twilio Imported')
time.sleep(1)
# End Imports

#start Script
while True:
    url = 'http://A****.com'
    print('>>> We have connected to ' +url)
    time.sleep(1)

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    print('>>> Headers Initiating')
    time.sleep(1)

    page_response = requests.get(url, timeout=5)
    print('>>> We got a response from ' +url)
    time.sleep(1)

    page_content = soup(page_response.content, "html.parser") # Takes 1 Min 48 Seconds to run
    print('>>> Content Imported')
    time.sleep(2)

    print('>>> To prove i have connected, here is ' +url+ ' headers')
    time.sleep(2)
    print(' ')
    print(page_content.title)
    #tree = html.fromstring(page_response.content)
    #price = tree.xpath('//span[@class="bid-price-val current-bid"]/text()')
    #print(price)
    time.sleep(2)
    print(' ')
    time.sleep(1)
    print('>>> Initiating WebMonitor, If a change is found. That will be the next line')
    time.sleep(7)

    if str(soup).find('["330000"]') == -1:
        time.sleep(60)                       #The script restarts here 
                                             #never sees the change
                                             #Even tho there was one
        continue
    else:
        print('>>> Theres been a change in '+url)
        from twilio.rest import TwilioRestClient
        accountSID = 'A*******'
        authToken = 'a********'
        twilioCli = TwilioRestClient(accountSID, authToken)
        myTwilioNumber = '1******'
        myCellPhone = '7*****'
        message = client.messages.create(
            body = "There has been a change at "+url,
            from_= "+14955551234",
            to = "7862199047",
            )

        print(message.sid)

        msg = 'Subject: This is the script talking, Check '+url
        fromaddr = 'r****'
        toaddrs = ['m****','2','3']

        server = smtplib.SMTP('smtp.gmail.com', 587)
        server.starttls()
        server.login("r****", 'r****')

        print('From: ' + fromaddr)
        print('To: ' + str(toaddrs))
        print('Message: ' + msg)
        server.sendmail(fromaddr, toaddrs, msg)
        server.quit()
        break
    #def monitor():

Answer 1

好像您的问题在此行中一样：

 if str(soup).find('["330000"]') == -1:

您说str(soup)时，是在尝试将Beautiful Soup类转换为字符串。那不会很好地工作。它只会创建类似"<class 'bs4.BeautifulSoup'>"的字符串。在该字符串上使用汤的find()方法将永远不会找到匹配项，因此无论是否进行任何更改，结果始终为-1。

Web搜寻脚本无法正常运行

1 个答案: