我的数据框如下所示:
col1 col2 col3 col4 col5
[1,] 1 NA NA 13 NA
[2,] NA NA 10 NA 18
[3,] NA 7 NA 15 NA
[4,] 4 NA NA 16 NA
现在我想"崩溃"将此数据框划分为具有较少列和删除NA的数据帧。事实上,我正在寻找" excel的做法":删除一个单元格,整行将向左移动一个单元格。
此示例中的结果将是:
col1 col2
[1,] 1 13
[2,] 10 18
[3,] 7 15
[4,] 4 16
有没有人知道如何在R中这样做?非常感谢提前!
答案 0 :(得分:4)
您可以使用apply
。如果df是你的数据帧`:
df2 <- apply(df,1,function(x) x[!is.na(x)])
df3 <- data.frame(t(df2))
colnames(df3) <- colnames(df)[1:ncol(df3)]
输出:
# col1 col2
# 1 13
# 10 18
# 7 15
# 4 16
答案 1 :(得分:3)
您可以使用apply
和na.exclude
DF
## V1 V2 V3 V4 V5
## 1 1 NA NA 13 NA
## 2 NA NA 10 NA 18
## 3 NA 7 NA 15 NA
## 4 4 NA NA 16 NA
t(apply(DF, 1, na.exclude))
## [,1] [,2]
## [1,] 1 13
## [2,] 10 18
## [3,] 7 15
## [4,] 4 16
如果您想保持data.frame
相同的维度,可以使用sort
代替na.last=TRUE
。这也将处理您在不同行中具有不等数量值的情况。
t(apply(DF, 1, sort, na.last = T))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 13 NA NA NA
## [2,] 10 18 NA NA NA
## [3,] 7 15 NA NA NA
## [4,] 4 16 NA NA NA
答案 2 :(得分:1)
这个功能有点啰嗦,但(1)从长远来看会更快,(2)它提供了很大的灵活性:
myFun <- function(inmat, outList = TRUE, fill = NA, origDim = FALSE) {
## Split up the data by row and isolate the non-NA values
myList <- lapply(sequence(nrow(inmat)), function(x) {
y <- inmat[x, ]
y[!is.na(y)]
})
## If a `list` is all that you want, the function stops here
if (isTRUE(outList)) {
myList
} else {
## If you want a matrix instead, it goes on like this
Len <- vapply(myList, length, 1L)
## The new matrix can be either just the number of columns required
## or it can have the same number of columns as the input matrix
if (isTRUE(origDim)) Ncol <- ncol(inmat) else Ncol <- max(Len)
Nrow <- nrow(inmat)
M <- matrix(fill, ncol = Ncol, nrow = Nrow)
M[cbind(rep(sequence(Nrow), Len), sequence(Len))] <-
unlist(myList, use.names=FALSE)
M
}
}
为了测试它,让我们创建一个函数来组成一些虚拟数据:
makeData <- function(nrow = 10, ncol = 5, pctNA = .8, maxval = 25) {
a <- nrow * ncol
m <- matrix(sample(maxval, a, TRUE), ncol = ncol)
m[sample(a, a * pctNA)] <- NA
m
}
set.seed(1)
m <- makeData(nrow = 5, ncol = 4, pctNA=.6)
m
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] 10 24 NA 18
# [3,] NA 17 NA 25
# [4,] NA 16 10 NA
# [5,] NA 2 NA NA
......并应用它......
myFun(m)
# [[1]]
# integer(0)
#
# [[2]]
# [1] 10 24 18
#
# [[3]]
# [1] 17 25
#
# [[4]]
# [1] 16 10
#
# [[5]]
# [1] 2
myFun(m, outList = FALSE)
# [,1] [,2] [,3]
# [1,] NA NA NA
# [2,] 10 24 18
# [3,] 17 25 NA
# [4,] 16 10 NA
# [5,] 2 NA NA
## Try also
## myFun(m, outList = FALSE, origDim = TRUE)
而且,与其他答案相比,让我们对更大的数据进行一些计时:
set.seed(1)
m <- makeData(nrow = 1e5, ncol = 5, pctNA = .75)
## Will return a matrix
funCP <- function(inmat) t(apply(inmat, 1, sort, na.last = T))
system.time(funCP(m))
# user system elapsed
# 9.776 0.000 9.757
## Will return a list in this case
funJT <- function(inmat) apply(inmat, 1, function(x) x[!is.na(x)])
system.time(JT <- funJT(m))
# user system elapsed
# 0.577 0.000 0.575
## Output a list
system.time(AM <- myFun(m))
# user system elapsed
# 0.469 0.000 0.466
identical(JT, AM)
# [1] TRUE
## Output a matrix
system.time(myFun(m, outList=FALSE, origDim=TRUE))
# user system elapsed
# 0.610 0.000 0.612
因此,list
输出似乎比@ JT85的解决方案稍快,matrix
输出显得稍慢。但是,与逐行使用sort
相比,这是一个明显的改进。