如何使用动态列名重塑为多级,多顺序的宽数据

时间:2014-08-01 21:24:18

标签: r plyr reshape

抱歉,如果这有明显的答案。当我只使用一列或者列名称可以硬编码时,我尝试执行具有大量stackoverflow答案的重构,但我需要一个能够在ordered.cols和{{{{}}时动态工作的答案1}}向量不是从头开始设置的

unique.cols

如果新列名为am1,am2,vs1,vs2或更方便的话,我真的不在乎。但如果数据中有两个不同的# these two sets of columns need to be dynamic # they might be any two sets of columns! ordered.cols <- c( 'cyl' , 'gear' ) unique.cols <- c( 'am' , 'vs' ) # neither of the above two character vectors will be known beforehand # so here's the example starting data set x <- mtcars[ , c( ordered.cols , unique.cols ) ] # the desired output should have this many records: unique( x[ , ordered.cols ] ) # but i'm unsure of the smartest way to add the additional columns that i want-- # for *each* unique level in *each* of the variables in # `unique.cols` there should be one additional column added # to the final output. then, for that `ordered.cols` combination # the cell should be populated with the value if it exists # and NA otherwise desired.output <- structure(list(cyl = c(4L, 4L, 4L, 6L, 6L, 6L, 8L, 8L), gear = c(3L, 4L, 5L, 3L, 4L, 5L, 3L, 5L), am1 = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L), am2 = c(NA, 1L, NA, NA, 1L, NA, NA, NA), vs1 = c(1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L), vs2 = c(NA, NA, 1L, NA, 1L, NA, NA, NA)), .Names = c("cyl", "gear", "am1", "am2", "vs1", "vs2" ), class = "data.frame", row.names = c(NA, -8L)) desired.output 值,则最终输出中需要有两个数据保持列,如果该组合没有该值,则其中一个应该丢失。< / p>

am

感谢!!!!!!

2 个答案:

答案 0 :(得分:2)

library(data.table)

reshapeMyData <- function(x, unique.cols, ordered.cols, NA_class="integer") {
  DT <- as.data.table(x)

  unique.values <- lapply(DT[, unique.cols, with=FALSE], unique)

  ## If your NA is of the wrong class, it can potentially throw an error, 
  ##    depending on when it first shows up.  It is better to be explicit about the expected class
  NA.classed <- as(NA, NA_class)

  ###  -- This is all one line.. it iterates over the unique combinations of ordered.cols values
  DT[, {browser(expr=FALSE)
    ## These three functions shape the data as needed
    setDT(as.list(unlist(
      ## This mapply call checks if each value is in the given group
      mapply(function(v, C) {ifelse(v %in% C, v, NA.classed)}, v=unique.values, C=.SD, SIMPLIFY=FALSE)
    )))
  }
  , keyby=ordered.cols, .SDcols=unique.cols]

} ## // end function reshapeMyData

输出

reshapeMyData(x, unique.cols, ordered.cols)

   cyl gear am1 am2 vs1 vs2
1:   4    3  NA   0  NA   1
2:   4    4   1   0  NA   1
3:   4    5   1  NA   0   1
4:   6    3  NA   0  NA   1
5:   6    4   1   0   0   1
6:   6    5   1  NA   0  NA
7:   8    3  NA   0   0  NA
8:   8    5   1  NA   0  NA

reshapeMyData(y, "d", c("a", "b"), NA_class="character")

   a b d1 d2 d3
1: 1 1  z  y NA
2: 1 2 NA NA  x
3: 2 2 NA NA  x

答案 1 :(得分:0)

就我的目的而言,这个解决方案似乎运作良好:

aggregate( x[ , unique.cols ] , by = x[ , ordered.cols ] , function( w ) paste( sort( unique( w ) ) , collapse = "," ) )

aggregate( y[ , unique.cols ] , by = y[ , ordered.cols ] , function( w ) paste( sort( unique( w ) ) , collapse = "," ) )

有时候(我不确定为什么,但我认为这是一个因素 - 强制问题)nrow( unique( x[ , ordered.cols ] ) )不等于上述命令输出的nrow。在这些情况下,这种解决方法似乎可以解决问题:

halfway <- aggregate( x[ , unique.cols ] , by = list( apply( x[ , ordered.cols ] , 1 , paste , collapse = "" ) ) , function( w ) paste( sort( unique( w ) ) , collapse = "," )  )

cbind( unique( x[ , ordered.cols ] ) , halfway[ , -1 ] )