Question

I'm currently working with functional MRI data in R but I need to import it to Python for some faster analysis. How can I do that in an efficient way?

I currently have in R a list of 198135 dataframes. All of them have 5 variables and 84 observations of connectivity between brain regions. I need to display the same 198135 dataframes in Python for running some specific analysis there (with the same structure than in R: one object that contains all dataframes separately).

Initially I tried exporting a .RDS file from R and then importing it to Python using "pyreadr", but I'm getting empty objects in every atempt with "pyreadr.read_r()" function.

My other method was to save every dataframe of the R list as a separate .csv file, and then importing them to Python. In that way I could get what I wanted (I tried it with 100 dataframes only for trying the code). The problem with this method is that is highly inefficient and slow.

I found several answers to similar problems, but most of them were to merge all dataframes and load it as a unique .csv into Python, which is not the solution I need.

Is there some more efficient way to do this process, without altering the data structure that I mentioned?

Thanks for your help!

# This is the code in R for an example

a <- as.data.frame(cbind(c(1:3), c(1:3), c(4:6), c(7:9)))
b <- as.data.frame(cbind(c(11:13), c(21:23), c(64:66), c(77:79)))
c <- as.data.frame(cbind(c(31:33), c(61:63), c(34:36), c(57:59)))
d <- as.data.frame(cbind(c(12:14), c(13:15), c(54:56), c(67:69)))
e <- as.data.frame(cbind(c(31:33), c(51:53), c(54:56), c(37:39)))

somelist_of_df <- list(a,b,c,d,e)


saveRDS(somelist_of_df, "somefile.rds")

## This is the function I used from pyreadr in Python


import pyreadr

results = pyreadr.read_r('/somepath/somefile.rds')

Answer 1

This package may be of some interest to you

Answer 2

好吧，谢谢您在其他答案中的帮助，但这并不是我想要的（我只想导出其中一个包含数据帧列表的文件，然后将一个文件加载到Python，相同的结构）。为了使用Feather，您必须分解列表中的所有数据框，就像保存单独的.csv文件一样，然后将每个文件加载到Python（或R）中。无论如何，必须说它比使用.csv的方法要快得多。

我将成功使用的代码留在一个单独的答案中，也许对其他人有用，因为我使用了一个简单的循环将数据帧作为列表加载到Python中：

1.currentProvider
2.null
3.(nothing)

（仅使用MacBook Air，上面的代码用了不到5秒的时间即可运行198135 DF列表）

## Exporting a list of dataframes from R to .feather files

library(feather) #required package

a <- as.data.frame(cbind(c(1:3), c(1:3), c(4:6), c(7:9))) #Example DFs
b <- as.data.frame(cbind(c(11:13), c(21:23), c(64:66), c(77:79)))
c <- as.data.frame(cbind(c(31:33), c(61:63), c(34:36), c(57:59)))
d <- as.data.frame(cbind(c(12:14), c(13:15), c(54:56), c(67:69)))
e <- as.data.frame(cbind(c(31:33), c(51:53), c(54:56), c(37:39)))

somelist_of_df <- list(a,b,c,d,e) 

## With sapply you loop over the list for creating the .feather files

sapply(seq_along(1:length(somelist_of_df)), 
       function(i) write_feather(somelist_of_df[[i]], 
                                 paste0("/your/directory/","DF",i,".feather")))

（此代码为我完成了工作，除了它有点慢，花了12分钟来完成198135 DF的任务）

我希望这对某人有用。

Answer 3

Pandas还实现了直接读取 .feather 文件的方法：

pd.read_feather()

Answer 4

Pyreadr当前无法读取R列表，因此您需要单独保存数据帧，还需要保存到RDA文件，以便可以在一个文件中托管多个数据帧：

# first construct a list with the names of dataframes you want to save
# instead of the dataframes themselves
somelist_of_df <- list("a", "b", "c", "d", "e")
do.call("save",  c(somelist_of_df, file="somefile.rda"))

或here中所述的任何其他变体。

然后您可以在python中读取文件：

import pyreadr

results = pyreadr.read_r('/somepath/somefile.rda')

优点是所有数据帧只有一个文件。

Answer 5

由于声誉，我无法在@ crlagos0答案中发表评论。我想添加几件事：

seq_along(list_of_things)就足够了，无需在seq_along(lenght(1:list_of_things))中执行R。另外，我想指出的是，用于在R中读写羽毛文件的官方软件包称为arrow，您可以找到其文档here。在python中是pyarrow。

How to export a list of dataframes from R to Python?

5 个答案: