Question

使用pandas读取.csv文件，然后使用rpy2包将其转换为R数据帧后，我使用一些R函数（也通过rpy2）创建了一个模型，现在想要获取模型的摘要并将其转换为Pandas数据帧（以便我可以将其保存为.csv文件或将其用于其他目的）。

我已经遵循了pandas网站上的说明（来源：https://pandas.pydata.org/pandas-docs/stable/r_interface.html），以便弄清楚：

import pandas as pd
from rpy2.robjects import r
import sys
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects import r, pandas2ri 

pandas2ri.activate()
caret = rpackages.importr('caret')
broom= rpackages.importr('broom')

my_data= pd.read_csv("my_data.csv")
r_dataframe= pandas2ri.py2ri(my_data)

preprocessing= ["center", "scale"]
center_scale= StrVector(preprocessing)

#these are the columns in my data frame that will consist of my predictors in the model
predictors= ['predictor1','predictor2','predictor3']
predictors_vector= StrVector(predictors)

#this column from the dataframe consists of the outcome of the model
outcome= ['fluorescence']
outcome_vector= StrVector(outcome)

#this line extracts the columns of the predictors from the dataframe
columns_predictors= r_dataframe.rx(True, columns_vector)

#this line extracts the column of the outcome from the dataframe
column_response= r_dataframe.rx(True, column_response)

cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100)

model_R= caret.train(columns_predictors, columns_response, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl)

summary_model= base.summary(model_R)

coefficients= stats.coef(summary_model)

pd_dataframe = pandas2ri.ri2py(coefficients)

pd_dataframe.to_csv("coefficents.csv")

虽然此工作流表面上是正确的，但输出.csv文件不符合我的需要，因为删除了列和行的名称。当我运行命令type(pd_dataframe)时，我发现它是<type 'numpy.ndarray'>。虽然表的信息仍然存在，但新格式化已删除列和行的名称。

所以我运行了type(coefficients)命令，发现它是<class 'rpy2.robjects.vectors.Matrix'>。由于这个Matrix对象仍然保留了我的列和行的名称，我试图将其转换为R对象的DataFrame，但我的努力被证明是徒劳的。此外，我不知道为什么行pd_dataframe = pandas2ri.ri2py(coefficients)没有产生pandas DataFrame对象，也不知道为什么它没有保留我的列和行的名称。

任何人都可以推荐一种方法，这样我就可以获得一些保留我的列和行名称的pandas DataFrame吗？

更新

在名为pandas2ri.ri2py_dataframe的包稍旧版本的文档中提到了一种新方法（来源：https://rpy2.readthedocs.io/en/version_2.7.x/changes.html），现在我有一个正确的数据框而不是numpy数组。但是，我仍然无法正确传输要传输的行和列的名称。有什么建议吗？

Answer 1

可能应该在转换过程中自动发生，但同时可以从R对象轻松获取行名和列名，并将其添加到pandas DataFrame中。例如，R矩阵的列名应位于：https://rpy2.github.io/doc/v2.9.x/html/vector.html#rpy2.robjects.vectors.Matrix.colnames

如何将rpy2矩阵对象转换为Pandas数据框？

1 个答案: