How to export a list of dataframes from R to Python?

时间:2019-04-17 00:06:50

标签: python r dataframe

I'm currently working with functional MRI data in R but I need to import it to Python for some faster analysis. How can I do that in an efficient way?

I currently have in R a list of 198135 dataframes. All of them have 5 variables and 84 observations of connectivity between brain regions. I need to display the same 198135 dataframes in Python for running some specific analysis there (with the same structure than in R: one object that contains all dataframes separately).

Initially I tried exporting a .RDS file from R and then importing it to Python using "pyreadr", but I'm getting empty objects in every atempt with "pyreadr.read_r()" function.

My other method was to save every dataframe of the R list as a separate .csv file, and then importing them to Python. In that way I could get what I wanted (I tried it with 100 dataframes only for trying the code). The problem with this method is that is highly inefficient and slow.

I found several answers to similar problems, but most of them were to merge all dataframes and load it as a unique .csv into Python, which is not the solution I need.

Is there some more efficient way to do this process, without altering the data structure that I mentioned?

Thanks for your help!

# This is the code in R for an example

a <- as.data.frame(cbind(c(1:3), c(1:3), c(4:6), c(7:9)))
b <- as.data.frame(cbind(c(11:13), c(21:23), c(64:66), c(77:79)))
c <- as.data.frame(cbind(c(31:33), c(61:63), c(34:36), c(57:59)))
d <- as.data.frame(cbind(c(12:14), c(13:15), c(54:56), c(67:69)))
e <- as.data.frame(cbind(c(31:33), c(51:53), c(54:56), c(37:39)))

somelist_of_df <- list(a,b,c,d,e)


saveRDS(somelist_of_df, "somefile.rds") 
## This is the function I used from pyreadr in Python


import pyreadr

results = pyreadr.read_r('/somepath/somefile.rds')


5 个答案:

答案 0 :(得分:0)

This package may be of some interest to you

答案 1 :(得分:0)

好吧,谢谢您在其他答案中的帮助,但这并不是我想要的(我只想导出其中一个包含数据帧列表的文件,然后将一个文件加载到Python,相同的结构)。为了使用Feather,您必须分解列表中的所有数据框,就像保存单独的.csv文件一样,然后将每个文件加载到Python(或R)中。无论如何,必须说它比使用.csv的方法要快得多。

我将成功使用的代码留在一个单独的答案中,也许对其他人有用,因为我使用了一个简单的循环将数据帧作为列表加载到Python中:

1.currentProvider
2.null
3.(nothing)

(仅使用MacBook Air,上面的代码用了不到5秒的时间即可运行198135 DF列表)

## Exporting a list of dataframes from R to .feather files

library(feather) #required package

a <- as.data.frame(cbind(c(1:3), c(1:3), c(4:6), c(7:9))) #Example DFs
b <- as.data.frame(cbind(c(11:13), c(21:23), c(64:66), c(77:79)))
c <- as.data.frame(cbind(c(31:33), c(61:63), c(34:36), c(57:59)))
d <- as.data.frame(cbind(c(12:14), c(13:15), c(54:56), c(67:69)))
e <- as.data.frame(cbind(c(31:33), c(51:53), c(54:56), c(37:39)))

somelist_of_df <- list(a,b,c,d,e) 

## With sapply you loop over the list for creating the .feather files

sapply(seq_along(1:length(somelist_of_df)), 
       function(i) write_feather(somelist_of_df[[i]], 
                                 paste0("/your/directory/","DF",i,".feather")))

(此代码为我完成了工作,除了它有点慢,花了12分钟来完成198135 DF的任务)

我希望这对某人有用。

答案 2 :(得分:0)

Pandas还实现了直接读取 .feather 文件的方法:

pd.read_feather()

答案 3 :(得分:0)

Pyreadr当前无法读取R列表,因此您需要单独保存数据帧,还需要保存到RDA文件,以便可以在一个文件中托管多个数据帧:

# first construct a list with the names of dataframes you want to save
# instead of the dataframes themselves
somelist_of_df <- list("a", "b", "c", "d", "e")
do.call("save",  c(somelist_of_df, file="somefile.rda"))

here中所述的任何其他变体。

然后您可以在python中读取文件:

import pyreadr

results = pyreadr.read_r('/somepath/somefile.rda')

优点是所有数据帧只有一个文件。

答案 4 :(得分:0)

由于声誉,我无法在@ crlagos0答案中发表评论。我想添加几件事:

seq_along(list_of_things)就足够了,无需在seq_along(lenght(1:list_of_things))中执行R。另外,我想指出的是,用于在R中读写羽毛文件的官方软件包称为arrow,您可以找到其文档here。在python中是pyarrow