我正在制作电子邮件抓取工具,伪系统如下
Stage 1. 1.Fetch all links from url Stage 2. 2.Scrape emails Stage 3. 3.Scrape links Stage 4. 4. If all links are processed, go to end_scene(which just asks me where i want to save em etc) 4.1 if an interruption has happend, go to end_scene
主要动作部分位于while len(unprocessed_urls)
下的stage2中
我会有自己的逻辑来创建URL,并创建一个try except
来请求URL的响应,这就是魔术发生的地方。
在这里我可以简单地放置一个except KeyboardInterrupt
并将其发送到我的函数中。
现在问题出现在我正在抓取电子邮件的阶段3中,该部分不在任何try/except
块中,因此我无法真正实现中断器或不确定如何突然停止
核心问题是在确定的某个时刻,如果我按ctrl+c
,它将抛出默认错误异常,并且我的代码永远不会运行。>
这是逻辑:
# process urls one by one from unprocessed_url queue until queue is empty
while len(unprocessed_urls):
...URL processing...
try:
...heres the request is made...
response = requests.get(url, timeout=3)
done = True
except requests.exceptions.ConnectionError as e:
print("\n[ERROR]Connection Error:")
print(e)
continue
except requests.Timeout as e:
print("\n[ERROR]Connection Timeout:")
print(e)
continue
except requests.HTTPError as e:
print("\n[ERROR]HTTP Error:")
print(e)
continue
except requests.RequestException as e:
print("\n[ERROR]General Error:")
print(e)
continue
...this works...
# Check for CTRL+C interruption
except KeyboardInterrupt:
end_scene()
# extract all email addresses and add them into the resulting set
...email extraction logic...
if len(new_emails) is 0:
...print no emails...
else:
...print emails found...
# create a beutiful soup for the html document
soup = BeautifulSoup(response.text, 'lxml')
# Once this document is parsed and processed, now find and process all the anchors i.e. linked urls in this document
for anchor in soup.find_all("a"):
# extract link url from the anchor
link = anchor.attrs["href"] if "href" in anchor.attrs else ''
# resolve relative links (starting with /)
if link.startswith('/'):
link = base_url + link
elif not link.startswith('http'):
link = path + link
# add the new url to the queue if it was not in unprocessed list nor in processed list yet
if not link in unprocessed_urls and not link in processed_urls:
unprocessed_urls.append(link)
所以问题是,如何在启动任何键盘中断时构建代码以使我放心,我可以运行我的代码?
答案 0 :(得分:0)
我觉得这可能不是正确的方法,但是您可以尝试使用contextmanager
:
import time
from contextlib import contextmanager
# build your keyboard interrupt listener
@contextmanager
def kb_listener(func):
print('Hey want to listen to KeyboardInterrupt?')
try:
yield func
except KeyboardInterrupt:
print("Who's there?")
interrupt() # <--- what you actually want to do when KeyboardInterrupt
# This might not be necessary for your code
finally:
print('Keyboa^C')
# sample KeyboardInterrupt event
def interrupt():
print("KeyboardInterrupt.")
# sample layered function
def do_thing():
while True:
print('Knock Knock')
time.sleep(1)
with kb_listener(do_thing) as f:
f()
测试输出:
Hey want to listen to KeyboardInterrupt? Knock Knock Knock Knock Knock Knock Who's there? KeyboardInterrupt. Keyboa^C
至少通过这种方式,您不需要将整个函数包装在try... except
块中。
答案 1 :(得分:0)
#!/usr/bin/env python
import signal
import sys
def signal_handler(sig, frame):
print('You pressed Ctrl+C!')
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
print('Press Ctrl+C')
signal.pause()
来自How do I capture SIGINT in Python?的剪刀
我建议您使用并注册适当的信号处理程序,至少在您的主要目标是仅捕获任何用户/系统中断的情况下。
这是清理所有退出/中断的好方法。
如果您将应用程序作为服务来处理关闭事件等,也可以使用。