将行转换为列不适用于大数据框

时间:2017-11-16 21:20:49

标签: r

如何有效地将行转换为列?在here之前已经问过这个问题,但是当你有一个更大的数据框时它不起作用。我试图制作更大规模的示例数据,但它与我的数据并不完全相同。实际上,我的数据不会一遍又一遍地重复。我的真实数据是6列乘4,000行。下面描述的解决方案适用于模拟示例,但不适用于我的实际数据。

重复是下面描述的条件,其中有多行只在一列中有所不同,在" description_9列"下面的情况下。我想要压缩数据,而不是让多行只有一列是不同的,这样就可以将列拆分成多列。据我所知,它可以是5列。

任何帮助将不胜感激!即使帮助制作一个更现实的例子也会很棒!

我的数据看起来像这样

      #fake data 
    condition_group <- c("nevous system", "nevous system",


"nevous system","circulatory system problems", 
 "circulatory system problems")
        sub_category    <- c("disease A", "disease A",  
    "disease A","disease B", "disease C")
        code_red        <- c(1,1,1,2,3)
        code_orange     <- c("x24", "x24","x24", "j897", "23a")
        description_9    <- c("mucous / mucous", "non maternal fetus", "third", "hello", "NA")
        description_10    <- c("NA", "NA", "NA", "NA", "blue")

        df <- cbind.data.frame(condition_group, sub_category,
     code_red, code_orange, description_9, description_10 )


print(df2)

此解决方案仅适用于小数据

#reshape data 
     df$fake_id       <- c(seq(1, nrow(df)))
     idcns            <- names(df)[!names(df)%in%c('fake_id','description_9')];
     reformatted_data <- reshape(transform(df,fake_id=NULL,time=ave(df$fake_id,df[idcns],FUN=seq_along)),dir='w',idvar=idcns,sep='');
    #change na to blanks
    df2 <- reformatted_data 
    df2 <- sapply(df2, as.character) # since your values are `factor`
    df2[is.na(df2)] <- ""
    print(df2)

尝试使用大数据

的示例
# # this doesnt work with lots of data 
    larger_data <- rbind(df, df, df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df,df)

    larger_data <-rbind(larger_data, larger_data, larger_data, larger_data,
larger_data, larger_data, larger_data, larger_data, larger_data, larger_data, larger_data, larger_data,larger_data, larger_data, larger_data, larger_data,larger_data, larger_data, larger_data, larger_data)

df_larger <- rbind(larger_data, larger_data, larger_data, larger_data,larger_data, larger_data, larger_data, larger_data, larger_data)
#reshape data 
df_larger$fake_id       <- c(seq(1, nrow(df_larger)))
idcns            <- names(df_larger)[!names(df_larger)%in%c('fake_id','description_9')];
reformatted_data <- reshape(transform(df_larger,fake_id=NULL,time=ave(df_larger$fake_id,df_larger[idcns],FUN=seq_along)),dir='w',idvar=idcns,sep='');

0 个答案:

没有答案