如何按日期在R中对大数据帧(ffdf)进行子集化?

时间:2013-10-17 12:51:10

标签: r ffbase ff

我正在尝试按日期对FFDF进行分组。下面,我使用普通数据框成功创建了这样一个子集。但我需要一些帮助才能将其应用于FFDF。我的尝试以及错误消息列在代码注释中。非常感谢提前!

#Create a normal data frame (in production this is read directly into an ffdf 
#through a csv file)

start  <- c("01/01/2010", "01/01/2011", "01/01/2012", "01/01/2012", "01/01/2012")
end  <- c("31/12/2010", "31/12/2011", "31/12/2012", "31/12/2012", "31/12/2012")
amount <- c(10,20,30,40,50)
df <- data.frame(start,end,amount)

#Ensure subsetting works on a normal data frame

  #convert type to proper date (this has to be done in production after csv file
  #has been read in)
  df$start <- as.Date(df$start, format="%d/%m/%Y")
  df$end <- as.Date(df$end, format="%d/%m/%Y")

  #Subset
  df <- subset(df, start == as.Date("2012-01-01",format="%Y-%m-%d"))

  #Works :) Now let's try with ffdf

ffdf <- as.ffdf(df)

  #Type conversion for dates (again, applied in production after mammoth csv has
  #been read in)
  ffdf$start <- as.Date(ffdf$start, format="%m/%d/%Y")
  ffdf$end <- as.Date(ffdf$end, format="%m/%d/%Y")

  #Subset
  ffdf <- subset.ff(ffdf, start==as.Date("2012-01-01",format="%Y-%m-%d"))
  #ERROR: Error in ffdf(x = x) : ffdf components must be atomic ff objects

1 个答案:

答案 0 :(得分:2)

使用package ffbase中的subset.ffdf。子集是R中的通用函数,ffbase为ffdf对象实现它。因此,您可以像使用常规数据框一样使用子集。

df <- data.frame(start=c("01/01/2010", "01/01/2011", "01/01/2012", "01/01/2012", "01/01/2012"),end=c("31/12/2010", "31/12/2011", "31/12/2012", "31/12/2012", "31/12/2012"),amount=c(10,20,30,40,50))
df$start <- as.Date(df$start, "%d/%m/%Y")
df$end<- as.Date(df$end, "%d/%m/%Y")

require(ffbase)
myffdf <- as.ffdf(df)
test <- subset(myffdf , start==as.Date("2012-01-01",format="%Y-%m-%d"))
test