我有一个数据框,每列有以下变量类:
"date" "numeric" "numeric" "list" "list" "numeric"
每行中的数据如下所示:
1978-01-01, 12.5, 6.3, c(0,0,0.25,0.45,0.3), c(0,0,0,0.1,0.9), 72
我想将其转换为每列一个值的矩阵或数据框,因此结果应如下所示:
1978-01-01, 12.5, 6.3, 0, 0, 0.25, 0.45, 0.3, 0, 0, 0, 0.1, 0.9, 72
我尝试过使用:
j<-unlist(input)
output<-matrix(j,nrow=nrow(input),ncol=length(j)/nrow(input))
但是它会混淆输出中行的顺序。
有什么想法吗?
其他信息:
上面的示例略有简化,dput(head(input))
返回以下示例:
structure(list(DATE = structure(c(2924, 2925, 2926, 2927, 2928,
2929), class = "Date"), TEMP_MEAN_M0 = c(-7.625, -7.375, -6,
-5.5, -7.625, -9.625), SLP_MEAN_M0 = c(1012.125, 991.975, 989.825,
986.675, 988.95, 993.075), WIND_DIR_RF_M0 = structure(list(`2.counts` = c(0,
0.625, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0, 0, 0, 0.125), `3.counts` = c(0.75,
0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.375,
0.125, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.125, 0, 0, 0), `5.counts` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125,
0, 0, 0.125, 0.375, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `6.counts` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125,
0, 0.25, 0.125, 0.25, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `7.counts` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0.125, 0.5, 0.375, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("2.counts",
"3.counts", "4.counts", "5.counts", "6.counts", "7.counts")),
CEIL_HGT_RF_M0 = structure(list(`2.counts` = c(0.625, 0,
0, 0, 0, 0, 0, 0, 0, 0.375), `3.counts` = c(0.75, 0.125,
0, 0.125, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.25, 0.125,
0, 0.125, 0, 0, 0, 0, 0.25, 0.25), `5.counts` = c(0, 0, 0,
0, 0, 0, 0, 0, 0.125, 0.875), `6.counts` = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 1), `7.counts` = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 1)), .Names = c("2.counts", "3.counts", "4.counts", "5.counts",
"6.counts", "7.counts")), WIND_SPD_MEAN_M0 = c(12.8125, 18.7375,
6.175, 8.175, 10.5375, 16.5375)), .Names = c("DATE", "TEMP_MEAN_M0",
"SLP_MEAN_M0", "WIND_DIR_RF_M0", "CEIL_HGT_RF_M0", "WIND_SPD_MEAN_M0"
), row.names = c(NA, 6L), class = "data.frame")
答案 0 :(得分:14)
这有些混乱,可能效率很低,但应该有助于您入门:
以下是一些示例数据:
mydf <- data.frame(Date = as.Date(c("1978-01-01", "1978-01-02")),
V1 = c(10, 10),
V2 = c(11, 11))
mydf$V3 <- list(c(1:10),
c(11:20))
mydf$V4 <- list(c(21:25),
c(26:30))
mydf
# Date V1 V2 V3 V4
# 1 1978-01-01 10 11 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 21, 22, 23, 24, 25
# 2 1978-01-02 10 11 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 26, 27, 28, 29, 30
并且,一个小函数检查哪些列是列表,对于这些列,rbind
将它们放在一起,最终cbind
使用不是<的列/ em>列表。
myFun <- function(data) {
temp1 <- sapply(data, is.list)
temp2 <- do.call(
cbind, lapply(data[temp1], function(x)
data.frame(do.call(rbind, x), check.names=FALSE)))
cbind(data[!temp1], temp2)
}
myFun(mydf)
# Date V1 V2 V3.1 V3.2 V3.3 V3.4 V3.5 V3.6 V3.7 V3.8 V3.9 V3.10 V4.1
# 1 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21
# 2 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26
# V4.2 V4.3 V4.4 V4.5
# 1 22 23 24 25
# 2 27 28 29 30
这仅在每个“列”列表包含相同长度的向量时才有效(否则基数R的rbind
将不起作用)。
半天后重新审视这个问题,我看到另一个用户(@user1981275)发布了一个更直接的解决方案,但随后删除了他们的答案。也许他们删除了因为他们的方法将日期转换为整数,因为正如DWin指出的那样,矩阵中的项必须是完全相同的模式。
以下是他们的解决方案:
t(apply(mydf, 1, unlist))
# Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45
# [1,] 2922 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25
# [2,] 2923 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30
以下是如何轻松修改以获得所需的输出。这肯定比以前的方法更快:
cbind(mydf[!sapply(mydf, is.list)],
(t(apply(mydf[sapply(mydf, is.list)], 1, unlist))))
# Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45
# 1 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25
# 2 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30
或者,作为用户功能:
myFun <- function(data) {
ListCols <- sapply(data, is.list)
cbind(data[!ListCols], t(apply(data[ListCols], 1, unlist)))
}
myFun(mydf)
我还编写了一个名为col_flatten
的更有效的函数,它是我的“SOfun”软件包的一部分。
使用以下方法安装软件包:
source("http://news.mrdwab.com/install_github.R")
install_github("mrdwab/SOfun")
然后,你可以这样做:
library(SOfun)
col_flatten(mydf, names(which(sapply(mydf, is.list))), drop = TRUE)
## Date V1 V2 V3_1 V3_2 V3_3 V3_4 V3_5 V3_6 V3_7 V3_8 V3_9 V3_10 V4_1 V4_2 V4_3 V4_4 V4_5
## 1: 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25
## 2: 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30
它基于“data.table”中的transpose
函数,所以请确保您也有“data.table”。