虽然我发现了一些关于这个问题的假设和理论帖子,但我发现的最接近的是here,并且所发布的答案与我认为我正在寻求帮助的情况相反。 (以防该链接有助于其他人)。
我从Github上的wiki here获得了以下代码。它的实现看起来非常简单,但是,我无法以原生形式使用它。
这是我正在使用的“处理”代码:
import dask.dataframe as dd
from concurrent.futures import ProcessPoolExecutor
import pandas as pd
import gdelt
gd = gdelt.gdelt(version=2)
e = ProcessPoolExecutor()
def getter(x):
try:
date = x.strftime('%Y%m%d')
d = gd.Search(date, coverage=True)
d.to_csv("{}_gdeltdata.csv".format(date),encoding='utf-8',index=False)
except:
pass
results = list(e.map(getter,pd.date_range('2015 Apr 21','2018 Apr 21')))
这是完整的错误:
BrokenProcessPool Traceback (most recent call last)
<ipython-input-1-874f937ce512> in <module>()
21
22 # now pull the data; this will take a long time
---> 23 results = list(e.map(getter,pd.date_range('2015 Apr 21','2018 Apr 21')))
24
25
C:\Anaconda3\lib\concurrent\futures\process.py in_chain_from_iterable_of_lists(iterable)
364 careful not to keep references to yielded objects.
365 """
--> 366 for element in iterable:
367 element.reverse()
368 while element:
C:\Anaconda3\lib\concurrent\futures\_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
--> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.time())
C:\Anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
--> 432 return self.__get_result()
433 else:
434 raise TimeoutError()
C:\Anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
*BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.*
关于如何解决此错误的任何想法?我知道如果我将ProcessPoolExecutor更改为ThreadPoolExecutor,问题似乎得到解决(尽管我没有运行数据集通过所有方式,所以我不能完全确定),但是,如果我使用ProcessPoolExecutor,我相信我会有更快的结果。
最终,我将使用dask来处理Pandas中的数据。提前谢谢。
答案 0 :(得分:0)
文档中的示例始终显示在if (connected)
rt = (struct rtable *)sk_dst_check(sk, 0);
if (!rt) {
[..skip..]
rt = ip_route_output_flow(net, fl4, sk);
[..skip..]
}
子句中 中的执行。希望这个mcve可以准确地模拟您的用例
if __name__ == '__main__'
以这种方式执行,
def gd(s):
return s*3
def getter(w):
return gd(w)
data = list('abcdefg')
def main():
with ProcessPoolExecutor(max_workers=4) as executor:
for thing in executor.map(getter, data):
print(thing)
但是这样执行并不会-抛出#main()
if __name__ == '__main__':
main()
错误
BrokenProcessPool
尝试确保main()
if __name__ == '__main__':
#main()
行在results = list(e.map(getter,pd.date_range(...)))
进程中运行