我需要共享我作为ffdf对象导入R的数据集。 我的目标是能够轻松地将我的ffdf数据集导出为CSV格式,而不必担心只会夸大输出文件大小的NA值。
如果我使用的是简单的数据帧,我会使用以下语法:
write.csv(df, "C:/path/data.csv", row.names=FALSE, na="")
但是write.csv.ffdf函数似乎没有将“na”作为参数。任何人都可以告诉我正确的语法,以便我不必对输出文件进行后期处理以取消NA值吗?
答案 0 :(得分:1)
我认为您对write.csv.ffdf
的行为进行了不准确的描述。
require(ff)
# What follows is a minor modification of the first example in the `write.* help page.
> x <- data.frame(log=rep(c(FALSE, TRUE), length.out=26), int=c(NA, 2:26),
dbl=c(1:25,NA) + 0.1, fac=factor(c(letters[2:26], NA)),
ord=c(NA, ordered(LETTERS[2:26])), dct=Sys.time()+1:26,
dat=seq(as.Date("1910/1/1"), length.out=26, by=1))
> ffx <- as.ffdf(x)
> write.csv(ffx, na="")
"","log","int","dbl","fac","ord","dct","dat"
"1",FALSE,,1.1,"b",,2012-12-18 12:18:23,1910-01-01
"2",TRUE,2,2.1,"c",1,2012-12-18 12:18:24,1910-01-02
"3",FALSE,3,3.1,"d",2,2012-12-18 12:18:25,1910-01-03
"4",TRUE,4,4.1,"e",3,2012-12-18 12:18:26,1910-01-04
"5",FALSE,5,5.1,"f",4,2012-12-18 12:18:27,1910-01-05
"6",TRUE,6,6.1,"g",5,2012-12-18 12:18:28,1910-01-06
"7",FALSE,7,7.1,"h",6,2012-12-18 12:18:29,1910-01-07
"8",TRUE,8,8.1,"i",7,2012-12-18 12:18:30,1910-01-08
"9",FALSE,9,9.1,"j",8,2012-12-18 12:18:31,1910-01-09
"10",TRUE,10,10.1,"k",9,2012-12-18 12:18:32,1910-01-10
"11",FALSE,11,11.1,"l",10,2012-12-18 12:18:33,1910-01-11
"12",TRUE,12,12.1,"m",11,2012-12-18 12:18:34,1910-01-12
"13",FALSE,13,13.1,"n",12,2012-12-18 12:18:35,1910-01-13
"14",TRUE,14,14.1,"o",13,2012-12-18 12:18:36,1910-01-14
"15",FALSE,15,15.1,"p",14,2012-12-18 12:18:37,1910-01-15
"16",TRUE,16,16.1,"q",15,2012-12-18 12:18:38,1910-01-16
"17",FALSE,17,17.1,"r",16,2012-12-18 12:18:39,1910-01-17
"18",TRUE,18,18.1,"s",17,2012-12-18 12:18:40,1910-01-18
"19",FALSE,19,19.1,"t",18,2012-12-18 12:18:41,1910-01-19
"20",TRUE,20,20.1,"u",19,2012-12-18 12:18:42,1910-01-20
"21",FALSE,21,21.1,"v",20,2012-12-18 12:18:43,1910-01-21
"22",TRUE,22,22.1,"w",21,2012-12-18 12:18:44,1910-01-22
"23",FALSE,23,23.1,"x",22,2012-12-18 12:18:45,1910-01-23
"24",TRUE,24,24.1,"y",23,2012-12-18 12:18:46,1910-01-24
"25",FALSE,25,25.1,"z",24,2012-12-18 12:18:47,1910-01-25
"26",TRUE,26,,,25,2012-12-18 12:18:48,1910-01-26
如果你的目标是在写操作期间最小化RAM占用空间,那么首先看看:
getOption("ffbatchbytes")
答案 1 :(得分:0)
write.csv.ffdf
没有na
参数,但write.table.ffdf
将na
参数传递到它包装的write.table1
函数。
只需使用sep=","
,你就可以了。
即使对于大型ff变量也是如此。