Question

抱歉，如果这有明显的答案。当我只使用一列或者列名称可以硬编码时，我尝试执行具有大量stackoverflow答案的重构，但我需要一个能够在ordered.cols和{{{{}}时动态工作的答案1}}向量不是从头开始设置的

unique.cols

如果新列名为am1，am2，vs1，vs2或更方便的话，我真的不在乎。但如果数据中有两个不同的# these two sets of columns need to be dynamic # they might be any two sets of columns! ordered.cols <- c( 'cyl' , 'gear' ) unique.cols <- c( 'am' , 'vs' ) # neither of the above two character vectors will be known beforehand # so here's the example starting data set x <- mtcars[ , c( ordered.cols , unique.cols ) ] # the desired output should have this many records: unique( x[ , ordered.cols ] ) # but i'm unsure of the smartest way to add the additional columns that i want-- # for *each* unique level in *each* of the variables in # `unique.cols` there should be one additional column added # to the final output. then, for that `ordered.cols` combination # the cell should be populated with the value if it exists # and NA otherwise desired.output <- structure(list(cyl = c(4L, 4L, 4L, 6L, 6L, 6L, 8L, 8L), gear = c(3L, 4L, 5L, 3L, 4L, 5L, 3L, 5L), am1 = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L), am2 = c(NA, 1L, NA, NA, 1L, NA, NA, NA), vs1 = c(1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L), vs2 = c(NA, NA, 1L, NA, 1L, NA, NA, NA)), .Names = c("cyl", "gear", "am1", "am2", "vs1", "vs2" ), class = "data.frame", row.names = c(NA, -8L)) desired.output值，则最终输出中需要有两个数据保持列，如果该组合没有该值，则其中一个应该丢失。< / p>

am

感谢!!!!!!

Answer 1

library(data.table)

reshapeMyData <- function(x, unique.cols, ordered.cols, NA_class="integer") {
  DT <- as.data.table(x)

  unique.values <- lapply(DT[, unique.cols, with=FALSE], unique)

  ## If your NA is of the wrong class, it can potentially throw an error, 
  ##    depending on when it first shows up.  It is better to be explicit about the expected class
  NA.classed <- as(NA, NA_class)

  ###  -- This is all one line.. it iterates over the unique combinations of ordered.cols values
  DT[, {browser(expr=FALSE)
    ## These three functions shape the data as needed
    setDT(as.list(unlist(
      ## This mapply call checks if each value is in the given group
      mapply(function(v, C) {ifelse(v %in% C, v, NA.classed)}, v=unique.values, C=.SD, SIMPLIFY=FALSE)
    )))
  }
  , keyby=ordered.cols, .SDcols=unique.cols]

} ## // end function reshapeMyData

输出

reshapeMyData(x, unique.cols, ordered.cols)

   cyl gear am1 am2 vs1 vs2
1:   4    3  NA   0  NA   1
2:   4    4   1   0  NA   1
3:   4    5   1  NA   0   1
4:   6    3  NA   0  NA   1
5:   6    4   1   0   0   1
6:   6    5   1  NA   0  NA
7:   8    3  NA   0   0  NA
8:   8    5   1  NA   0  NA

reshapeMyData(y, "d", c("a", "b"), NA_class="character")

   a b d1 d2 d3
1: 1 1  z  y NA
2: 1 2 NA NA  x
3: 2 2 NA NA  x

Answer 2

就我的目的而言，这个解决方案似乎运作良好：

aggregate( x[ , unique.cols ] , by = x[ , ordered.cols ] , function( w ) paste( sort( unique( w ) ) , collapse = "," ) )

aggregate( y[ , unique.cols ] , by = y[ , ordered.cols ] , function( w ) paste( sort( unique( w ) ) , collapse = "," ) )

有时候（我不确定为什么，但我认为这是一个因素 - 强制问题）nrow( unique( x[ , ordered.cols ] ) )不等于上述命令输出的nrow。在这些情况下，这种解决方法似乎可以解决问题：

halfway <- aggregate( x[ , unique.cols ] , by = list( apply( x[ , ordered.cols ] , 1 , paste , collapse = "" ) ) , function( w ) paste( sort( unique( w ) ) , collapse = "," )  )

cbind( unique( x[ , ordered.cols ] ) , halfway[ , -1 ] )

如何使用动态列名重塑为多级，多顺序的宽数据

2 个答案:

输出