如何将R中保存为RData的矩阵导入pandas数据框而不丢失R矩阵的列名?
例如,如果我在R:
中保存了这个矩阵A = matrix(
c(2, 4, 3, 1, 5, 7), # the data elements
nrow=2, # number of rows
ncol=3, # number of columns
byrow = TRUE) # fill matrix by rows
dimnames(A) = list(
c("row1", "row2"), # row names
c("col1", "col2", "col3")) # column names
A
save (A, file = 'matrix.RData')
输出:
> A
col1 col2 col3
row1 2 4 3
row2 1 5 7
然后使用rpy2在python中加载如下:
from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects
def main():
pandas2ri.activate()
r['load']('matrix.RData')
variables = tuple(robjects.globalenv.keys())
print('variables: {0}'.format(variables))
matrix = robjects.globalenv['A']
frame = pandas2ri.ri2py(matrix)
print(frame)
print('type(frame): {0}'.format(type(frame)))
if __name__ == "__main__":
main()
打印:
variables: ('A',)
[[ 2. 4. 3.]
[ 1. 5. 7.]]
type(frame): <type 'numpy.ndarray'>
矩阵丢失了他的列名。我想通过将R加载到pandas数据框中来保留它们。
答案 0 :(得分:1)
有一个名为feather
的软件包,它以一种可读取为R和Pandas数据帧的格式保存数据帧。
在R:
write_feather(as.data.frame(A), 'path/df.feather')
在Python中:
df = pd.read_feather('path/df.feather')
您可以在此处找到更多详细信息:
答案 1 :(得分:0)
您可以使用colnames
(使用python 2.7测试):
from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects
import pandas as pd
def load_r_matrix_into_pandas_dataframe(r_matrix):
'''
Import a matrix from R saved as RData to a pandas data frame without losing the column names of the R matrix
https://stackoverflow.com/q/45392308/395857
- Input: R matrix object
- Output: Pandas DataFrame
'''
numpy_matrix = pandas2ri.ri2py(r_matrix)
frame_column_names = r_matrix.colnames
frame = pd.DataFrame(data=numpy_matrix, columns=list(frame_column_names))
return frame
def main():
pandas2ri.activate()
r['load']('matrix.RData')
variables = tuple(robjects.globalenv.keys())
print('variables: {0}'.format(variables))
matrix = robjects.globalenv['A']
frame = load_r_matrix_into_pandas_dataframe(matrix)
print('frame: {0}'.format(frame))
if __name__ == "__main__":
main()