所以我想检查一个网站,以便在发布新项目时更新我。它们不经常更新,所以我很确定当它们更新时它将成为感兴趣的项目。我想通过选择“起始编号”并计算页面上的链接数来实现这一点,然后将该数字与每10分钟的链接数进行比较,直到链接数大于起始编号。
首先,我运行此命令以获取链接的“起始编号”:
links=[]
for link in soup.findAll('a'):
links.append(link.get('href'))
start_num = len(links)
然后将该数字与现在和每5秒的链接数进行比较:
notify=True
while notify:
try:
page = urllib.request.urlopen('web/site/url')
soup = bs(page, "lxml")
links=[]
for link in soup.findAll('a'):
links.append(link.get('href'))
if len(links) > start_num:
message = client.messages.create(to="", from_="",body="")
print('notified')
notify=False
else:
print('keep going')
time.sleep(60*5)
except:
print("Going to sleep")
time.sleep(60*10)
如何将所有这些组合成1个函数,我运行时可以存储链接的起始数量而不会在每次检查当前链接数时覆盖它?
答案 0 :(得分:0)
你至少可以通过两种方式去做:装饰者和生成器
装修:
def hang_on(func):
# soup should be in a visible scope
def count_links():
# refresh page?
return len(soup.findAll('a'))
start_num = count_links()
def wrapper(*args, **kwargs):
while True:
try:
new_links = count_links()
if new_links > start_num:
start_num = new_links
return fund(*args, **kwargs)
print('keep going')
time.sleep(60*5)
except:
print("Going to sleep")
time.sleep(60*10)
return wrapper
@hang_on
def notify():
message = client.messages.create(to="", from_="",body="")
print('notified')
# somewhere in your code, simply:
notify()
发生器:
def gen_example(soup):
# initialize soup (perhaps from url)
# soup should be in a visible scope
def count_links():
# refresh page?
return len(soup.findAll('a'))
start_num = count_links()
while True:
try:
new_links = count_links()
if new_links > start_num:
start_num = new_links
message = client.messages.create(to="", from_="",body="")
print('notified')
yield True # this is what makes this func a generator
print('keep going')
time.sleep(60*5)
except:
print("Going to sleep")
time.sleep(60*10)
# somewhere in your code:
gen = gen_example(soup) # initialize
gen.next() # will wait and notify
# coming soon
答案 1 :(得分:0)
我会将它作为一个类来实现,因为这段代码非常易读且易于支持。享受:
class Notifier:
url = 'web/site/url'
timeout = 60 * 10
def __links_count(self):
page = urllib.request.urlopen(self.url)
soup = bs(page, "lxml")
links=[]
for link in soup.findAll('a'):
links.append(link.get('href'))
return len(links)
def __notify(self):
client.messages.create(to="", from_="", body="")
print('notified')
def run(self):
current_count = self.__links_count()
while True:
try:
new_count = self.__links_count()
if new_count > current_count:
self.__notify()
break
sleep(self.timeout)
except:
print('Keep going')
sleep(self.timeout)
notifier = Norifier()
notifier.run()