使用rpy2将R软件包安装/导入到python中,导入/忽略有问题的数据包

时间:2019-01-09 15:03:17

标签: python r rpy2 kolmogorov-smirnov

这就是我想要做的:

  1. 我想使用discrete Kolmogorov-Smirov Goodness-of-fit test,目前仅在R中可用。此外,R has the normal KS test as well-我不想使用此测试。
  2. 我是python用户,因此需要将离散的KS测试移植到python,以进行I am trying to use rpy2

我面临的问题(如更详细的统计信息here所述)是rpy2似乎已用标准版本替代了导入的离散测试。我知道这一点,因为它在测试时无法产生正确的答案。

到目前为止的尝试

import rpy2.robjects.packages as r
utils = r.importr("utils")
package_name = "dgof"
utils.install_packages(package_name)

结果

/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: 

  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: 
  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: The downloaded source packages are in
    ‘/tmp/RtmpTBas6a/downloaded_packages’
  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Updating HTML index of packages in '.Library'

  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Making 'packages.html' ...
  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning:  done

  warnings.warn(x, RRuntimeWarning)
rpy2.rinterface.NULL

好,到目前为止,它应该已经安装好了。因此,让我们导入它:

# Import Discrete goodness-of-fit package which includes KS and CVM tests.
dgof = rpackages.importr('dgof')

它真的导入了吗?让我们看看:

env = r.wherefrom('dgof')

返回

/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error: object 'dgof' not found

  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: In addition: 
  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Warning message:

  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: In (function (x, y, ..., alternative = c("two.sided", "less", "greater"),  :
  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: 

  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning:  cannot compute correct p-values with ties

  warnings.warn(x, RRuntimeWarning)

  warnings.warn(x, RRuntimeWarning)

好吧,这很奇怪,但是也许不管怎么说,让我们看看(this is exactly the same example as used on the R side并且应该返回D = 0.66667, p-value = 0.07407):

import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
import numpy as np
a = np.array([1,1,1])
b = np.arange(1,3)
dgof.ks_test(a,b)

返回

D = 0.5, p-value = 0.925086

如果这对您没有任何意义,那么您需要知道的是这是错误的。这似乎是错误的,因为以某种方式装入了标准ks_test而不是离散的标准(我们在上面列表的第2项中讨论的标准)。让我们通过加载标准库和KS测试来验证:

from rpy2.robjects.packages import importr
base     = importr('base')
stats    = importr('stats')
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
import numpy as np

a = np.array([1,1,1])
b = np.arange(1,3)
stats.ks_test(a,b)

返回

D = 0.5, p-value = 0.925086

这很酷-有人知道为什么会这样吗?

注意: this question is related to my other question,但在python方面有更多详细信息。

1 个答案:

答案 0 :(得分:0)

  

它真的导入了吗?让我们看看:

env = r.wherefrom('dgof')
     

返回

/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error: object 'dgof' not found

RRuntimeWarning来自R本身,这是人们所期望的。没有对象dgof,因为R包名称空间不是对象。

您可能想要的wherefrom('ks.test')(请参阅 https://rpy2.github.io/doc/v2.9.x/html/robjects_rpackages.html#finding-where-an-r-symbol-is-coming-from

这之间可能会发生很多事情,具体取决于软件包dgof的功能(如果您来自Python,R可以让软件包开发人员做一些真正奇怪的事情)。

您是否尝试依赖R的调度和函数重载机制?加载R程序包dgof后,调用ks.test而不指定名称空间。

dgof = rpackages.importr('dgof')
import rpy2.robjects
# "generic" function ks.test
ks_test = rpy2.robjects.r('ks.test')
# Use it
ks_test(a, b)