Question

我正在制作电子邮件抓取工具，伪系统如下

Stage 1.
1.Fetch all links from url
Stage 2.
2.Scrape emails
Stage 3.
3.Scrape links 
Stage 4.
4. If all links are processed, go to end_scene(which just asks me where i want to save em etc)
4.1 if an interruption has happend, go to end_scene

主要动作部分位于while len(unprocessed_urls)下的stage2中我会有自己的逻辑来创建URL，并创建一个try except来请求URL的响应，这就是魔术发生的地方。 在这里我可以简单地放置一个except KeyboardInterrupt并将其发送到我的函数中。

现在问题出现在我正在抓取电子邮件的阶段3中，该部分不在任何try/except块中，因此我无法真正实现中断器或不确定如何突然停止

核心问题是在确定的某个时刻，如果我按ctrl+c，它将抛出默认错误异常，并且我的代码永远不会运行。

这是逻辑：

   # process urls one by one from unprocessed_url queue until queue is empty
while len(unprocessed_urls):

     ...URL processing...

     try:       
        ...heres the request is made...
        response = requests.get(url, timeout=3)
        done = True
    except requests.exceptions.ConnectionError as e:
        print("\n[ERROR]Connection Error:")
        print(e)
        continue
    except requests.Timeout as e:   
        print("\n[ERROR]Connection Timeout:")
        print(e)
        continue
    except requests.HTTPError as e:   
        print("\n[ERROR]HTTP Error:")
        print(e)
        continue
    except requests.RequestException as e:   
        print("\n[ERROR]General Error:")
        print(e)
        continue    
        ...this works...
        # Check for CTRL+C interruption
    except KeyboardInterrupt:
            end_scene()

    # extract all email addresses and add them into the resulting set
      ...email extraction logic...

    if len(new_emails) is 0:
       ...print no emails...
    else:
       ...print emails found...        
    # create a beutiful soup for the html document
    soup = BeautifulSoup(response.text, 'lxml')

    # Once this document is parsed and processed, now find and process all the anchors i.e. linked urls in this document
    for anchor in soup.find_all("a"):
        # extract link url from the anchor
        link = anchor.attrs["href"] if "href" in anchor.attrs else ''
        # resolve relative links (starting with /)
        if link.startswith('/'):
            link = base_url + link
        elif not link.startswith('http'):
            link = path + link

            # add the new url to the queue if it was not in unprocessed list nor in processed list yet
            if not link in unprocessed_urls and not link in processed_urls:
                unprocessed_urls.append(link)

所以问题是，如何在启动任何键盘中断时构建代码以使我放心，我可以运行我的代码？

Answer 1

我觉得这可能不是正确的方法，但是您可以尝试使用contextmanager：

import time
from contextlib import contextmanager

# build your keyboard interrupt listener
@contextmanager
def kb_listener(func):
    print('Hey want to listen to KeyboardInterrupt?')
    try:
        yield func
    except KeyboardInterrupt:
        print("Who's there?")
        interrupt()      # <--- what you actually want to do when KeyboardInterrupt

    # This might not be necessary for your code
    finally:             
        print('Keyboa^C')

# sample KeyboardInterrupt event
def interrupt():         
    print("KeyboardInterrupt.")

# sample layered function
def do_thing():          
    while True:
        print('Knock Knock')
        time.sleep(1)

with kb_listener(do_thing) as f:
    f()

测试输出：

Hey want to listen to KeyboardInterrupt?
Knock Knock
Knock Knock
Knock Knock
Who's there?
KeyboardInterrupt.
Keyboa^C

至少通过这种方式，您不需要将整个函数包装在try... except块中。

Answer 2

#!/usr/bin/env python
import signal
import sys
def signal_handler(sig, frame):
        print('You pressed Ctrl+C!')
        sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
print('Press Ctrl+C')
signal.pause()

来自How do I capture SIGINT in Python?的剪刀

我建议您使用并注册适当的信号处理程序，至少在您的主要目标是仅捕获任何用户/系统中断的情况下。

这是清理所有退出/中断的好方法。
如果您将应用程序作为服务来处理关闭事件等，也可以使用。

如何确保我的KeyboardInterrupt除了可以满足我的需要

2 个答案: