如何在不阻塞其他线程的情况下在线程中运行rpy2(来自python的R代码)?

时间:2018-02-23 16:35:46

标签: python multithreading python-multithreading rpy2

问题摘要

我试图在第二个线程中使用rpy2从python运行R代码(不是multiprocessing,只是一个线程很好),但它似乎一直阻塞我的主线程。这可以避免吗?

我的原始代码:

import time
import threading
import rpy2.robjects as robjects

def long_r_function():
    robjects.r('Sys.sleep(10)')    # pretend we have a long-running function in R

r_thread = threading.Thread(target=long_r_function, daemon=True)
start_time = time.time()
r_thread.start()

while r_thread.is_alive():
    print("R running...")    # expect to be able to see this print from main thread, 
    time.sleep(2)            # while R does work in second thread

print("Finished after ", time.time() - start_time, " seconds")

预期输出:

我希望看到5“R running”打印件,因为当R在睡眠10秒钟时,主线程不会被阻挡。

实际输出:

我预期的单一印刷品5:

R running...
Finished after  12.006645679473877  seconds

rpy2.objects导入移动到线程的目标函数中:

因为import rpy2.robjects as robjects语句已经导致R环境被初始化,所以我认为将它移动到线程的目标函数long_r_function中可能会有所帮助,看看是否会以某种方式将它与主要内容解耦线。 这不起作用

用python代码替换rpy2代码

正如一个完整性检查,如果我完全摆脱rpy2代码,而是将线程的目标函数定义为

def long_r_function():
    time.sleep(10)

我正确地获得了预期的输出(由于没有任何R运行,现在正在说谎):

R running...
R running...
R running...
R running...
R running...
Finished after  10.002039194107056  seconds

更新:主线程中的rpy2也会阻止第二个线程

当我使用以下代码切换线程时,rpy2在主线程中运行R代码,我试着看看python是否仍然可以在第二个线程中执行某些操作,同样的问题就出现了。 / p>

def print_and_sleep():
    while True:
        print("Second thread running...")
        time.sleep(2)

py_thread = threading.Thread(target=print_and_sleep, daemon=True)
py_thread.start()
robjects.r('Sys.sleep(10)')

输出:只有一个或两个"Second thread running..."的打印,而不是预期的5.再次,与上面的一个相似的健全性检查工作正常,其中rpy2仅用python代码替换。

所以,这不是主线程特有的。好像rpy2只是阻塞所有python线程,无论它在哪里运行?

版本信息

rpy2.__version__ = 2.8.5
rpy2.rinterface.R_VERSION_BUILD = ('3', '3.2', '', 71607)
sys.version = 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]

1 个答案:

答案 0 :(得分:1)

Igautier的回答解释了为什么这在技术上变得困难。实现预期目标的最佳方法似乎是使用multiprocessing而不是threading,即使线程足够了。然后代码如下所示:

import time
from multiprocessing import Process

def long_r_function():
    import rpy2.robjects as robjects
    robjects.r('Sys.sleep(10)')    # pretend we have a long-running function in R

if __name__ == '__main__':
    r_process = Process(target=long_r_function)
    start_time = time.time()
    r_process.start()

    while r_process.is_alive():
        print("R running...")    # expect to be able to see this print from main process,
        time.sleep(2)            # while R does work in second process

    print("Finished after ", time.time() - start_time, " seconds")

这样可以按预期输出(由于创建新流程的开销,通常会打印6或7次而不是5次):

R running...
R running...
R running...
R running...
R running...
R running...
Finished after  12.413572311401367  seconds