与rpy2一起使用rpart时出现不一致的数组错误

时间:2017-10-17 05:50:36

标签: arrays numpy python-3.5 decision-tree rpy2

我在python 3.5上使用rpartrpy2(版本2.8.6),并希望训练一个决策树进行分类。我的代码片段如下所示:

import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr
from rpy2.robjects import numpy2ri
from rpy2.robjects import pandas2ri
from rpy2.robjects import DataFrame, Formula
rpart = importr('rpart')
numpy2ri.activate()
pandas2ri.activate()

dataf = DataFrame({'responsev': owner_train_label,
               'predictorv': owner_train_data})
formula = Formula('responsev ~.')
clf = rpart.rpart(formula = formula, data = dataf, method = "class", control=rpart.rpart_control(minsplit = 10, xval = 10))
  

其中owner_train_label是一个numpy float64数组形状(12610,)和   owner_train_data是一个numpy float64数组形状(12610,88)

这是我在运行最后一行代码以适应数据时遇到的错误。

RRuntimeError: Error in ((xmiss %*% rep(1, ncol(xmiss))) < ncol(xmiss)) & !ymiss : 
non-conformable arrays

我知道它告诉我它们是不一致的阵列但我不知道为什么对于相同的训练数据,我可以成功地使用sklearn的决策树进行训练。 谢谢你的帮助。

1 个答案:

答案 0 :(得分:0)

我通过使用pandas创建数据框并使用rpy2&#39; s pandas2ri将panadas数据帧传递给rpart来将其转换为R的数据帧。

from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects import Formula
rpart = importr('rpart')
pandas2ri.activate()

df = pd.DataFrame(data = owner_train_data)
df['l'] = owner_train_label
formula = Formula('l ~.')
clf = rpart.rpart(formula = formula, data = df, method = "class", control=rpart.rpart_control(minsplit = 10, xval = 10))