如何将R中保存为RData的矩阵导入到pandas数据框中,而不会丢失R矩阵的列名?

时间:2017-07-29 17:52:35

标签: python r pandas dataframe rpy2

如何将R中保存为RData的矩阵导入pandas数据框而不丢失R矩阵的列名?

例如,如果我在R:

中保存了这个矩阵
A = matrix( 
     c(2, 4, 3, 1, 5, 7), # the data elements 
     nrow=2,              # number of rows 
     ncol=3,              # number of columns 
     byrow = TRUE)        # fill matrix by rows 

dimnames(A) = list( 
     c("row1", "row2"),         # row names 
     c("col1", "col2", "col3")) # column names 

A
save (A, file = 'matrix.RData')

输出:

> A
     col1 col2 col3
row1    2    4    3
row2    1    5    7

然后使用rpy2在python中加载如下:

from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects

def main():
    pandas2ri.activate()
    r['load']('matrix.RData')
    variables = tuple(robjects.globalenv.keys())
    print('variables: {0}'.format(variables))
    matrix = robjects.globalenv['A']
    frame = pandas2ri.ri2py(matrix)
    print(frame)
    print('type(frame): {0}'.format(type(frame)))

if __name__ == "__main__":
    main()

打印:

variables: ('A',)
[[ 2.  4.  3.]
 [ 1.  5.  7.]]
type(frame): <type 'numpy.ndarray'>

矩阵丢失了他的列名。我想通过将R加载到pandas数据框中来保留它们。

2 个答案:

答案 0 :(得分:1)

有一个名为feather的软件包,它以一种可读取为R和Pandas数据帧的格式保存数据帧。

在R:

write_feather(as.data.frame(A), 'path/df.feather')

在Python中:

df = pd.read_feather('path/df.feather')

您可以在此处找到更多详细信息:

答案 1 :(得分:0)

您可以使用colnames(使用python 2.7测试):

from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects
import pandas as pd

def load_r_matrix_into_pandas_dataframe(r_matrix):
    '''
    Import a matrix from R saved as RData to a pandas data frame without losing the column names of the R matrix
    https://stackoverflow.com/q/45392308/395857
     - Input: R matrix object
     - Output: Pandas DataFrame
    '''
    numpy_matrix = pandas2ri.ri2py(r_matrix)
    frame_column_names = r_matrix.colnames
    frame = pd.DataFrame(data=numpy_matrix, columns=list(frame_column_names))
    return frame

def main():
    pandas2ri.activate()
    r['load']('matrix.RData')
    variables = tuple(robjects.globalenv.keys())
    print('variables: {0}'.format(variables))
    matrix = robjects.globalenv['A']

    frame = load_r_matrix_into_pandas_dataframe(matrix)
    print('frame: {0}'.format(frame))

if __name__ == "__main__":
    main()