Question

我在python中编写了一段用于抓取的代码。我有一个需要删除的网址列表，但是一段时间后，脚本会在循环阅读网页时丢失。所以我需要设置一个固定的时间，之后脚本应该从循环中出来并开始阅读下一个网页。

以下是示例代码。

def main():
    if <some condition>:
        list_of_links=['http://link1.com', 'http://link2.com', 'http://link3.com']
        for link in list_of_links:
            process(link)

def process():
    <some code to read web page>
    return page_read

脚本在方法process（）中丢失，在for循环中一次又一次地调用。如果process（）方法需要花费更多时间来阅读网页，我希望循环跳到下一个链接。

Answer 1

您可能可以使用timer。这取决于您的流程函数中的代码。如果main和process函数是类的方法，那么：

class MyClass:

    def __init__(self):
        self.stop_thread = False

    def main():
        if <some condition>:
            list_of_links=['http://link1.com', 'http://link2.com', 'http://link3.com']
        for link in list_of_links:
            process(link)

    def set_stop(self):
        self.stop_thread = True

    def process():
        t = Timer(60.0, self.set_stop)
        t.start() 
        # I don't know your code here
        # If you use some kind of loop it could be :
        while True:
            # Do something..
            if self.stop_thread:
                break
        # Or :
        if self.stop_thread:
            return

Answer 2

脚本丢失可能是因为远程服务器没有响应任何内容，或响应太慢。

您可以为套接字设置超时，以避免进程函数的这种行为。在主要功能的最开始

def main():
    socket.setdefaulttimeout(3.0)
    # process urls
    if ......

上面的代码片段意味着，如果在等待3秒后没有响应，则终止进程并引发超时异常。所以

try:
    process()
except:
    pass

会奏效。

如何终止python中的线程跳出循环然后再次继续循环？

2 个答案: