有些R datasets可以很容易loaded into a Pandas DataFrame or Panel:
import pandas.rpy.common as com
infert = com.load_data('infert')
print(infert.head())
只要R数据集的维度为< = 3,这似乎就有效。高维数据集会打印错误消息:
In [67]: com.load_data('Titanic')
Cannot handle dim=4
此错误消息源自rpy/common.py _convert_array
函数。
当然,Pandas无法直接将4维矩阵用于数据框架或面板,但有一些解决办法可以将Titanic
等数据集加载到DataFrame中(可能带有层次索引) )?
答案 0 :(得分:6)
在使用
安装reshape
包之后,使用@joran非常有用的建议
% sudo R
R> install.packages('reshape')
我设法将Titanic
数据集加载到Pandas DataFrame中:
import pandas as pd
import pandas.rpy.common as com
import rpy2.robjects as ro
r = ro.r
r('library(reshape)')
df = com.convert_robj(r('melt(Titanic)'))
print(df.head())
打印
Class Sex Age Survived value
1 1st Male Child No 0
2 2nd Male Child No 0
3 3rd Male Child No 35
4 Crew Male Child No 0
5 1st Female Child No 0
答案 1 :(得分:1)
使用Pandas version 0.13.0 or newer,pandas.rpy.common.load_data
可以加载更高维度的数据集,例如Titanic
:
import pandas.rpy.common as com
df = com.load_data('Titanic')
print(df.head())
产量
Survived Age Sex Class value
0 No Child Male 1st 0.0
1 No Child Male 2nd 0.0
2 No Child Male 3rd 35.0
3 No Child Male Crew 0.0
4 No Child Female 1st 0.0