I'm currently working with functional MRI data in R but I need to import it to Python for some faster analysis. How can I do that in an efficient way?
I currently have in R a list of 198135 dataframes. All of them have 5 variables and 84 observations of connectivity between brain regions. I need to display the same 198135 dataframes in Python for running some specific analysis there (with the same structure than in R: one object that contains all dataframes separately).
Initially I tried exporting a .RDS file from R and then importing it to Python using "pyreadr", but I'm getting empty objects in every atempt with "pyreadr.read_r()" function.
My other method was to save every dataframe of the R list as a separate .csv file, and then importing them to Python. In that way I could get what I wanted (I tried it with 100 dataframes only for trying the code). The problem with this method is that is highly inefficient and slow.
I found several answers to similar problems, but most of them were to merge all dataframes and load it as a unique .csv into Python, which is not the solution I need.
Is there some more efficient way to do this process, without altering the data structure that I mentioned?
Thanks for your help!
# This is the code in R for an example
a <- as.data.frame(cbind(c(1:3), c(1:3), c(4:6), c(7:9)))
b <- as.data.frame(cbind(c(11:13), c(21:23), c(64:66), c(77:79)))
c <- as.data.frame(cbind(c(31:33), c(61:63), c(34:36), c(57:59)))
d <- as.data.frame(cbind(c(12:14), c(13:15), c(54:56), c(67:69)))
e <- as.data.frame(cbind(c(31:33), c(51:53), c(54:56), c(37:39)))
somelist_of_df <- list(a,b,c,d,e)
saveRDS(somelist_of_df, "somefile.rds")
## This is the function I used from pyreadr in Python
import pyreadr
results = pyreadr.read_r('/somepath/somefile.rds')
答案 0 :(得分:0)
This package may be of some interest to you
答案 1 :(得分:0)
好吧,谢谢您在其他答案中的帮助,但这并不是我想要的(我只想导出其中一个包含数据帧列表的文件,然后将一个文件加载到Python,相同的结构)。为了使用Feather,您必须分解列表中的所有数据框,就像保存单独的.csv文件一样,然后将每个文件加载到Python(或R)中。无论如何,必须说它比使用.csv的方法要快得多。
我将成功使用的代码留在一个单独的答案中,也许对其他人有用,因为我使用了一个简单的循环将数据帧作为列表加载到Python中:
1.currentProvider
2.null
3.(nothing)
(仅使用MacBook Air,上面的代码用了不到5秒的时间即可运行198135 DF列表)
## Exporting a list of dataframes from R to .feather files
library(feather) #required package
a <- as.data.frame(cbind(c(1:3), c(1:3), c(4:6), c(7:9))) #Example DFs
b <- as.data.frame(cbind(c(11:13), c(21:23), c(64:66), c(77:79)))
c <- as.data.frame(cbind(c(31:33), c(61:63), c(34:36), c(57:59)))
d <- as.data.frame(cbind(c(12:14), c(13:15), c(54:56), c(67:69)))
e <- as.data.frame(cbind(c(31:33), c(51:53), c(54:56), c(37:39)))
somelist_of_df <- list(a,b,c,d,e)
## With sapply you loop over the list for creating the .feather files
sapply(seq_along(1:length(somelist_of_df)),
function(i) write_feather(somelist_of_df[[i]],
paste0("/your/directory/","DF",i,".feather")))
(此代码为我完成了工作,除了它有点慢,花了12分钟来完成198135 DF的任务)
我希望这对某人有用。
答案 2 :(得分:0)
Pandas还实现了直接读取 .feather 文件的方法:
pd.read_feather()
答案 3 :(得分:0)
Pyreadr当前无法读取R列表,因此您需要单独保存数据帧,还需要保存到RDA文件,以便可以在一个文件中托管多个数据帧:
# first construct a list with the names of dataframes you want to save
# instead of the dataframes themselves
somelist_of_df <- list("a", "b", "c", "d", "e")
do.call("save", c(somelist_of_df, file="somefile.rda"))
或here中所述的任何其他变体。
然后您可以在python中读取文件:
import pyreadr
results = pyreadr.read_r('/somepath/somefile.rda')
优点是所有数据帧只有一个文件。
答案 4 :(得分:0)