python threading.timer在程序运行时间时设置时间限制

时间:2016-11-22 17:55:47

标签: python multithreading timer timeout

我有一些与在Python中设置函数最大运行时间相关的问题。实际上,我想使用pdfminer.pdf文件转换为.txt

问题是,很多时候,某些文件无法解码并且需要很长时间。所以我想设置threading.Timer()以将每个文件的转换时间限制为5秒。另外,我在windows下运行,所以我不能使用signal模块。

我成功运行了pdfminer.convert_pdf_to_txt()转换代码(在我的代码中是“c”),但我不确定以下代码threading.Timer()是否有效。 (我认为这不会限制每次处理的时间)

总之,我想:

  1. 将pdf转换为txt

  2. 每次转化的时间限制为5秒,如果时间不够,则抛出异常并保存空文件

  3. 将所有txt文件保存在同一文件夹下

  4. 如果有任何例外/错误,仍然保存文件但内容为空。

  5. 以下是当前代码:

    import converter as c
    import os
    import timeit
    import time
    import threading
    import thread
    
    yourpath = 'D:/hh/'
    
    def iftimesout():
        print("no")
    
        with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
            newfile.write("")
    
    
    for root, dirs, files in os.walk(yourpath, topdown=False):
        for name in files:
            try:
               timer = threading.Timer(5.0,iftimesout)
               timer.start()
               t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
               a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
               g=str(a.split("\\")[1])
    
               with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                    newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
                    print("yes")
    
               timer.cancel()
    
             except KeyboardInterrupt:
                   raise
    
             except:
                 for name in files:
                     t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                     a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
    
                     g=str(a.split("\\")[1])
                     with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                         newfile.write("") 
    

2 个答案:

答案 0 :(得分:5)

我终于明白了!

首先,定义一个函数来调用另一个具有有限超时的函数:

import multiprocessing

def call_timeout(timeout, func, args=(), kwargs={}):
    if type(timeout) not in [int, float] or timeout <= 0.0:
        print("Invalid timeout!")

    elif not callable(func):
        print("{} is not callable!".format(type(func)))

    else:
        p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
        p.start()
        p.join(timeout)

        if p.is_alive():
            p.terminate()
            return False
        else:
            return True

该功能有什么作用?

  • 检查超时和功能是否有效
  • 在新进程中启动给定函数,这比线程
  • 有一些优势
  • 阻止程序x秒(p.join())并允许此时执行该功能
  • 超时到期后,检查功能是否仍在运行

    • 是:终止并返回False
    • 否:很好,没有超时!返回True

我们可以使用time.sleep()

进行测试
import time

finished = call_timeout(2, time.sleep, args=(1, ))
if finished:
    print("No timeout")
else:
    print("Timeout")

我们运行一个需要一秒钟完成的功能,超时设置为两秒:

No timeout

如果我们运行time.sleep(10)并将超时设置为两秒:

finished = call_timeout(2, time.sleep, args=(10, ))

结果:

Timeout

注意程序在两秒钟后停止而没有完成被调用的功能。

您的最终代码如下所示:

import converter as c
import os
import timeit
import time
import multiprocessing

yourpath = 'D:/hh/'

def call_timeout(timeout, func, args=(), kwargs={}):
    if type(timeout) not in [int, float] or timeout <= 0.0:
        print("Invalid timeout!")

    elif not callable(func):
        print("{} is not callable!".format(type(func)))

    else:
        p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
        p.start()
        p.join(timeout)

        if p.is_alive():
            p.terminate()
            return False
        else:
            return True

def convert(root, name, g, t):
    with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
        newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))

for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        try:
           t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
           a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
           g=str(a.split("\\")[1])
           finished = call_timeout(5, convert, args=(root, name, g, t))

           if finished:
               print("yes")
           else:
               print("no")

               with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                   newfile.write("")

        except KeyboardInterrupt:
             raise

       except:
           for name in files:
                t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])

               g=str(a.split("\\")[1])
               with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                   newfile.write("") 

代码应该易于理解,如果没有,请随时询问。

我真的希望这会有所帮助(因为它花了一些时间让我们做对了;))!

答案 1 :(得分:0)

检查以下代码,如有任何问题,请与我们联系。另请告诉我您是否仍想使用强制终止功能(KeyboardInterruption

path_to_pdf = "C:\\Path\\To\\Main\\PDFs" # No "\\" at the end of path!
path_to_text = "C:\\Path\\To\\Save\\Text\\" # There is "\\" at the end of path!
TIMEOUT = 5  # seconds
TIME_TO_CHECK = 1  # seconds


# Save PDF content into text file or save empty file in case of conversion timeout
def convert(path_to, my_pdf):
    my_txt = text_file_name(my_pdf)
    with open(my_txt, "w") as my_text_file:
         try:
              my_text_file.write(convert_pdf_to_txt(path_to + '\\' + my_pdf))
         except:
              print "Error. %s file wasn't converted" % my_pdf


# Convert file_name.pdf from PDF folder to file_name.text in Text folder
def text_file_name(pdf_file):
    return path_to_text + (pdf_file.split('.')[0]+ ".txt")


if __name__ == "__main__":
    # for each pdf file in PDF folder
    for root, dirs, files in os.walk(path_to_pdf, topdown=False):
        for my_file in files:
            count = 0
            p = Process(target=convert, args=(root, my_file,))
            p.start()
            # some delay to be sure that text file created
            while not os.path.isfile(text_file_name(my_file)):
                time.sleep(0.001)
            while True:
                # if not run out of $TIMEOUT and file still empty: wait for $TIME_TO_CHECK,
                # else: close file and start new iteration
                if count < TIMEOUT and os.stat(text_file_name(my_file)).st_size == 0:
                    count += TIME_TO_CHECK
                    time.sleep(TIME_TO_CHECK)
                else:
                    p.terminate()
                    break