重塑R中的ffdf数据帧

时间:2014-12-17 12:06:57

标签: r reshape2 ff ffbase

我使用dcast函数来搜索R中的数据帧, 但在使用大型数据帧时。我将其转换为ffdf dataframe无法使用dcast函数,如果有其他选择,请帮助我。找到我用于小数据帧的以下示例以及我想对ffdf dataframe执行的操作:

- hdsample <- read.csv("C:/Users/PK5016573/Desktop/hdsample.csv")
- View(hdsample)


hd<-dcast(hhpsample,MemberID~Year+Specialty+ProcedureGroup+Vendor+PlaceSvc+PCP+PrimaryConditionGroup+CharlsonIndex)

这是有效的,但是:

hhp<-read.ffdf("C:/Users/PK5016573/Desktop/hdsample.csv")

hd<-dcast(hhpsample,MemberID~Year+Specialty+ProcedureGroup+Vendor+PlaceSvc+PCP+PrimaryConditionGroup+CharlsonIndex)

这给了我错误请帮助

提前谢谢 pavan kancharala

1 个答案:

答案 0 :(得分:0)

我得到了这个问题的答案,但它可能无法解决大部分因素数据

# Reshape_function to process on data
   # Reshaping data as per year and Primary condition group
    library(reshape2)
    library(ffbase)
    reshapefunction<-function(x){
    df=dcast(x,MemberID~ Year+PrimaryConditionGroup,
    value.var= "rep.x..each...2668990.",              
    fun.aggregate = sum)
    }
    # Reshaping data using reshape_function 
    # Specifying size of chunks to process the data
    PrimaryConditionGroup<-ffdfdply(x=hhp,split=hhp$MemberID
    ,FUN = function(x) reshapefunction(x),BATCHBYTES = 100000000,trace=TRUE)

View(PrimaryConditionGroup)

所有数据均来自kaggle竞赛,增加了一栏“rep.x..each ... 2668990。”,其中每行包含1个汇总用途