Python多进程陷入无限循环

时间:2019-05-17 14:31:10

标签: python concurrency multiprocessing

我一直在尝试多处理模块,以将文本文件列表转换为BERT嵌入。

对于每个文件,都会创建BERT嵌入,但是对于特定文件,该过程不会完成。

我以前使用进程.join()操作来完成进程,但以前陷入僵局。

from multiprocessing import Process
import multiprocessing
import time
import sys

def process(file,appended_data):
    start = datetime.now()
    file1_obj = open(form_path + file, 'r')
    file1 = file1_obj.readlines()
    file1_obj.close()
    file11=[i.rstrip() for i in file1 if not(bool(not i or i.isspace()))] 
    file111=[' |||'.join(file11)]
    try:
        bc=BertClient()
        embedding1=bc.encode(file111)
        del bc
    except ValueError: #some files have '' as their first strins in the list
        embedding1=None
    appended_data.put({file:embedding1})
    print("finished %s"%file)
    print(datetime.now()-start)
    return appended_data

def embedding_dic(file_list):


    procs = []
    appended_data = multiprocessing.Queue()
    print(file_list[0])
    print(file_list)
    for file in file_list:
        procs.append(Process(target=process, args=(file,appended_data,)))

    for proc in procs:
        proc.start()

    results = []
    liveprocs = list(procs)
    while liveprocs:
        try:
            while 1:
                r=appended_data.get(False)
                results.append(r)
        except Exception:
            pass

        time.sleep(0.05)    # Give tasks a chance to put more data in
        if not appended_data.empty():
            continue
        liveprocs = [p for p in liveprocs if p.is_alive()]
        print(liveprocs)
        print(len(results))

    return results

对于某些文件,仍然会发生死锁。

说明如下:

对文件列表执行embedding_dic函数会导致


No of files available : 7

Files _names:
['0001368007_10-K_2007-03-22.txt', '0001368007_10-K_2008-03-25.txt', '0001368007_10-K_2009-02-27.txt', '0001368007_10-K_2010-03-01.txt', '0001368007_10-K_2011-02-28.txt', '0001368007_10-K_2012-02-29.txt', '0001368007_10-K_2012-02-29.txt']

Processes_started:

[<Process(Process-1899, started)>, <Process(Process-1900, started)>, <Process(Process-1901, started)>, <Process(Process-1902, started)>, <Process(Process-1903, started)>, <Process(Process-1904, started)>, <Process(Process-1905, started)>]

0
[<Process(Process-1899, started)>, <Process(Process-1900, started)>, <Process(Process-1901, started)>, <Process(Process-1902, started)>, <Process(Process-1903, started)>, <Process(Process-1904, started)>, <Process(Process-1905, started)>]
0
[<Process(Process-1899, started)>, <Process(Process-1900, started)>, <Process(Process-1901, started)>, <Process(Process-1902, started)>, <Process(Process-1903, started)>, <Process(Process-1904, started)>, <Process(Process-1905, started)>]
0
[<Process(Process-1899, started)>, <Process(Process-1900, started)>, <Process(Process-1901, started)>, <Process(Process-1902, started)>, <Process(Process-1903, started)>, <Process(Process-1904, started)>, <Process(Process-1905, started)>]
0
[<Process(Process-1899, started)>, <Process(Process-1900, started)>, <Process(Process-1901, started)>, <Process(Process-1902, started)>, <Process(Process-1903, started)>, <Process(Process-1904, started)>, <Process(Process-1905, started)>]
0
finished 0001368007_10-K_2009-02-27.txt
0:00:03.055049
finished 0001368007_10-K_2012-02-29.txt
0:00:03.023879
finished 0001368007_10-K_2012-02-29.txt
0:00:03.055496
finished 0001368007_10-K_2010-03-01.txt
0:00:03.096127
finished 0001368007_10-K_2011-02-28.txt
0:00:03.099099
[<Process(Process-1899, started)>, <Process(Process-1900, started)>]
5
finished 0001368007_10-K_2008-03-25.txt
0:00:04.473414
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
[<Process(Process-1899, started)>]
6
Process Process-1899:
  File "/home/jovyan/.conda/envs/pycp_py3k/lib/python3.6/site-packages/bert_serving/client/__init__.py", line 206, in arg_wrapper
    return func(self, *args, **kwargs)
[<Process(Process-1899, started)>]
6
Traceback (most recent call last):
  File "/home/jovyan/.conda/envs/pycp_py3k/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/jovyan/.conda/envs/pycp_py3k/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-315-ffe782d1c2f5>", line 12, in process
    embedding1=bc.encode(file111)
  File "/home/jovyan/.conda/envs/pycp_py3k/lib/python3.6/site-packages/bert_serving/client/__init__.py", line 291, in encode
    r = self._recv_ndarray(req_id)


因此,当提供文件列表作为输入时,此过程将陷入文件0001368007_10-K_2007-03-22.txt的死锁。

以防万一,我只尝试使用相同的文件作为输入。完成。

即使文件数量不超过5个,它也会完成。

甚至对于文件数量超过7(例如10或12)的文件列表也不同。

我无法对其进行调试。

我观察到的另一种症状

  • 如果我在适当的时间后重新运行代码,则代码完成。

帮助表示赞赏。

0 个答案:

没有答案