ordered.cols
和{{{{}}时动态工作的答案1}}向量不是从头开始设置的
unique.cols
如果新列名为am1,am2,vs1,vs2或更方便的话,我真的不在乎。但如果数据中有两个不同的# these two sets of columns need to be dynamic
# they might be any two sets of columns!
ordered.cols <- c( 'cyl' , 'gear' )
unique.cols <- c( 'am' , 'vs' )
# neither of the above two character vectors will be known beforehand
# so here's the example starting data set
x <- mtcars[ , c( ordered.cols , unique.cols ) ]
# the desired output should have this many records:
unique( x[ , ordered.cols ] )
# but i'm unsure of the smartest way to add the additional columns that i want--
# for *each* unique level in *each* of the variables in
# `unique.cols` there should be one additional column added
# to the final output. then, for that `ordered.cols` combination
# the cell should be populated with the value if it exists
# and NA otherwise
desired.output <-
structure(list(cyl = c(4L, 4L, 4L, 6L, 6L, 6L, 8L, 8L), gear = c(3L,
4L, 5L, 3L, 4L, 5L, 3L, 5L), am1 = c(0L, 0L, 1L, 0L, 0L, 1L,
0L, 1L), am2 = c(NA, 1L, NA, NA, 1L, NA, NA, NA), vs1 = c(1L,
1L, 0L, 1L, 0L, 0L, 0L, 0L), vs2 = c(NA, NA, 1L, NA, 1L, NA,
NA, NA)), .Names = c("cyl", "gear", "am1", "am2", "vs1", "vs2"
), class = "data.frame", row.names = c(NA, -8L))
desired.output
值,则最终输出中需要有两个数据保持列,如果该组合没有该值,则其中一个应该丢失。< / p>
am
感谢!!!!!!
答案 0 :(得分:2)
library(data.table)
reshapeMyData <- function(x, unique.cols, ordered.cols, NA_class="integer") {
DT <- as.data.table(x)
unique.values <- lapply(DT[, unique.cols, with=FALSE], unique)
## If your NA is of the wrong class, it can potentially throw an error,
## depending on when it first shows up. It is better to be explicit about the expected class
NA.classed <- as(NA, NA_class)
### -- This is all one line.. it iterates over the unique combinations of ordered.cols values
DT[, {browser(expr=FALSE)
## These three functions shape the data as needed
setDT(as.list(unlist(
## This mapply call checks if each value is in the given group
mapply(function(v, C) {ifelse(v %in% C, v, NA.classed)}, v=unique.values, C=.SD, SIMPLIFY=FALSE)
)))
}
, keyby=ordered.cols, .SDcols=unique.cols]
} ## // end function reshapeMyData
reshapeMyData(x, unique.cols, ordered.cols)
cyl gear am1 am2 vs1 vs2
1: 4 3 NA 0 NA 1
2: 4 4 1 0 NA 1
3: 4 5 1 NA 0 1
4: 6 3 NA 0 NA 1
5: 6 4 1 0 0 1
6: 6 5 1 NA 0 NA
7: 8 3 NA 0 0 NA
8: 8 5 1 NA 0 NA
reshapeMyData(y, "d", c("a", "b"), NA_class="character")
a b d1 d2 d3
1: 1 1 z y NA
2: 1 2 NA NA x
3: 2 2 NA NA x
答案 1 :(得分:0)
就我的目的而言,这个解决方案似乎运作良好:
aggregate( x[ , unique.cols ] , by = x[ , ordered.cols ] , function( w ) paste( sort( unique( w ) ) , collapse = "," ) )
aggregate( y[ , unique.cols ] , by = y[ , ordered.cols ] , function( w ) paste( sort( unique( w ) ) , collapse = "," ) )
有时候(我不确定为什么,但我认为这是一个因素 - 强制问题)nrow( unique( x[ , ordered.cols ] ) )
不等于上述命令输出的nrow
。在这些情况下,这种解决方法似乎可以解决问题:
halfway <- aggregate( x[ , unique.cols ] , by = list( apply( x[ , ordered.cols ] , 1 , paste , collapse = "" ) ) , function( w ) paste( sort( unique( w ) ) , collapse = "," ) )
cbind( unique( x[ , ordered.cols ] ) , halfway[ , -1 ] )