Python rpy2 - nls回归RRuntimeError

时间:2018-01-26 12:57:10

标签: python r pandas rpy2

我正在尝试使用Python中的R进行一些nls回归。我陷入了RRuntimeError的困境,而且我已经超出了我的专业水平,并且已经挣扎了几天才能让它发挥作用所以我会感激一些帮助。

这是我的csv数据: http://www.sharecsv.com/s/4cdd4f832b606d6616260f9dc0eedf38/ratedata.csv

这是我的代码:

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()

dfData = pd.read_csv('C:\\Users\\nick\\Desktop\\ratedata.csv')
rdf = pandas2ri.py2ri(dfData)

a = 0.5
b = 1.1
count = rdf.rx(True, 'Trials')
rates = rdf.rx(True, 'Successes')

base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})

my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')

d = ro.ListVector({'a': a, 'b': b})

fit = stats.nls(my_formula, weights=count, start=d)

除了以下内容之外,所有内容都在编译:

fit = stats.nls(my_formula, weights=count, start=d)

我得到以下追溯:

---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-12-3f7fcd7d7851> in <module>()
      6 d = ro.ListVector({'a': a, 'b': b})
      7 
----> 8 fit = stats.nls(my_formula, weights=count, start=d)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    176                 v = kwargs.pop(k)
    177                 kwargs[r_k] = v
--> 178         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
    179 
    180 pattern_link = re.compile(r'\\link\{(.+?)\}')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    104         for k, v in kwargs.items():
    105             new_kwargs[k] = conversion.py2ri(v)
--> 106         res = super(Function, self).__call__(*new_args, **new_kwargs)
    107         res = conversion.ri2ro(res)
    108         return res

RRuntimeError: Error in (function (formula, data = parent.frame(), start, control = nls.control(),  : 
  parameters without starting value in 'data': rates, count

如果有人能看到我出错的地方,或者可以提供建议,我会永远感激。我想要的只是Python中的两个数字,所以我可以用它来构造一些置信区间。

谢谢

1 个答案:

答案 0 :(得分:1)

考虑将所有公式变量合并到一个数据框中,并使用 data 参数。 ("a"+c())+((d)+"c")调用在R环境中查找,但 rates count 在Python范围内。因此,包含同一对象中的所有项目。然后使用Pandas数据帧或R dataframe:

运行as_formula
nls

或者,您可以使用import pandas as pd import rpy2.robjects as ro from rpy2.robjects.packages import importr from rpy2.robjects import pandas2ri base = importr('base', robject_translations={'with': '_with'}) stats = importr('stats', robject_translations={'format_perc': '_format_perc'}) a = 0.05 b = 1.1 d = ro.ListVector({'a': a, 'b': b}) dfData = pd.read_csv('Input.csv') dfData['count'] = dfData['Trials'].astype('float') dfData['rates'] = dfData['Successes'] / dfData['Trials'] dfData['a'] = a dfData['b'] = b pandas2ri.activate() rdf = pandas2ri.py2ri(dfData) my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))') # WITH PANDAS DATAFRAME fit = stats.nls(formula=my_formula, data=dfData, weights=dfData['count'], start=d) print(fit) # WITH R DATAFRAME fit = stats.nls(formula=my_formula, data=rdf, weights=rdf.rx(True, 'count'), start=d) print(fit) 而不使用robjects.globalenv参数:

data

等效于R:

ro.globalenv['rates'] = dfData['rates']
ro.globalenv['count'] = dfData['count']
ro.globalenv['a'] = dfData['a']
ro.globalenv['b'] = dfData['b']

fit = stats.nls(formula=my_formula, weights=dfData['count'], start=d)
print(fit)

# Nonlinear regression model    
#   model: rates ~ 1 - (1/(10^(a * count^(b - 1))))    
#    data: parent.frame()

#       a       b     
# 0.01043 1.24943     
#  weighted residual sum-of-squares: 14.37       

# Number of iterations to convergence: 6     
# Achieved convergence tolerance: 9.793e-07

# To return parameters    
num = fit.rx('m')[0].names.index('getPars')
obj = fit.rx('m')[0][num]()

print(obj[0])
# 0.010425686223717435

print(obj[1])
# 1.2494303314553932