我在python 3.5上使用rpart
和rpy2
(版本2.8.6),并希望训练一个决策树进行分类。我的代码片段如下所示:
import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr
from rpy2.robjects import numpy2ri
from rpy2.robjects import pandas2ri
from rpy2.robjects import DataFrame, Formula
rpart = importr('rpart')
numpy2ri.activate()
pandas2ri.activate()
dataf = DataFrame({'responsev': owner_train_label,
'predictorv': owner_train_data})
formula = Formula('responsev ~.')
clf = rpart.rpart(formula = formula, data = dataf, method = "class", control=rpart.rpart_control(minsplit = 10, xval = 10))
其中owner_train_label是一个numpy float64数组形状(12610,)和 owner_train_data是一个numpy float64数组形状(12610,88)
这是我在运行最后一行代码以适应数据时遇到的错误。
RRuntimeError: Error in ((xmiss %*% rep(1, ncol(xmiss))) < ncol(xmiss)) & !ymiss :
non-conformable arrays
我知道它告诉我它们是不一致的阵列但我不知道为什么对于相同的训练数据,我可以成功地使用sklearn的决策树进行训练。 谢谢你的帮助。
答案 0 :(得分:0)
我通过使用pandas创建数据框并使用rpy2&#39; s pandas2ri将panadas数据帧传递给rpart来将其转换为R的数据帧。
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects import Formula
rpart = importr('rpart')
pandas2ri.activate()
df = pd.DataFrame(data = owner_train_data)
df['l'] = owner_train_label
formula = Formula('l ~.')
clf = rpart.rpart(formula = formula, data = df, method = "class", control=rpart.rpart_control(minsplit = 10, xval = 10))