我有一个带有ID,日期和其他5个变量列的数据框。我想将数据帧转换为大小为(#ids,#dates,5)的3d矩阵。我知道如果所有id在数据框中具有相同的行数,我本可以使用dim函数等。但是,事实并非如此。我如何将不平衡(不确定是否正确)数据帧转换为3d矩阵,每个2d矩阵都对应一个id,并具有维度(#dates,5)。重要的是,每个2d矩阵的行数都随id而变化。
我对处理矩阵真的很不好。对此表示歉意。
id date x1 x2 x3 x4 x5
1: 1 2009-01-01 5 4 2 5.5 7
2: 1 2009-01-02 5.4 4.1 2.2 5.3 7.1
3: 1 2009-01-03 4.4 2.1 4.2 6.3 10.1
4: 2 2009-01-01 12.4 2.7 4.9 3.3 2.1
5: 3 2010-01-01 3.4 1.7 4.6 4.3 6.1
6: 4 2009-01-01 2.4 3.7 5.6 2.3 9.1
7: 4 2009-01-02 3.4 5.7 7.6 3.3 5.1
对于每个id,我想创建一个2d矩阵,整体上创建一个3d数组。我需要这种格式将数据传递到keras R库。谢谢。
此致
答案 0 :(得分:1)
这是一个整洁的选择:
library(tidyverse)
df <- data.frame(id = c(1L, 1L, 1L, 2L, 3L, 4L, 4L),
date = as.Date(c("2009-01-01", "2009-01-02", "2009-01-03", "2009-01-01", "2010-01-01", "2009-01-01", "2009-01-02")),
x1 = c(5, 5.4, 4.4, 12.4, 3.4, 2.4, 3.4),
x2 = c(4, 4.1, 2.1, 2.7, 1.7, 3.7, 5.7),
x3 = c(2, 2.2, 4.2, 4.9, 4.6, 5.6, 7.6),
x4 = c(5.5, 5.3, 6.3, 3.3, 4.3, 2.3, 3.3),
x5 = c(7, 7.1, 10.1, 2.1, 6.1, 9.1, 5.1))
a <- df %>%
complete(id, date, fill = map(df[3:7], ~0)) %>% # insert missing rows; fill with 0s
nest(-id) %>% # collapse other columns to list column of data frames
mutate(data = map(data, ~as.matrix(.x[-1]))) %>% # drop dates from nested data frames and coerce each to matrix
pull(data) %>% # extract matrix list
invoke(abind::abind, ., along = 3) %>% # abind in 3rd dimension
`dimnames<-`(list(as.character(unique(df$date)), names(df[3:7]), unique(df$id))) # set dimnames properly
a
#> , , 1
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 5.0 4.0 2.0 5.5 7.0
#> 2009-01-02 5.4 4.1 2.2 5.3 7.1
#> 2009-01-03 4.4 2.1 4.2 6.3 10.1
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0
#>
#> , , 2
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 12.4 2.7 4.9 3.3 2.1
#> 2009-01-02 0.0 0.0 0.0 0.0 0.0
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0
#>
#> , , 3
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 0.0 0.0 0.0 0.0 0.0
#> 2009-01-02 0.0 0.0 0.0 0.0 0.0
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 3.4 1.7 4.6 4.3 6.1
#>
#> , , 4
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 2.4 3.7 5.6 2.3 9.1
#> 2009-01-02 3.4 5.7 7.6 3.3 5.1
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0
答案 1 :(得分:1)
不确定我了解您的预期输出,但是建议您将data.frame
分成list
的{{1}},或者data.frame
为每个{ {1}}。
选项1:nest
设置
id
选项2:split
ing
split(df, df$id)
#$`1`
# id date x1 x2 x3 x4 x5
#1 1 2009-01-01 5.0 4.0 2.0 5.5 7.0
#2 1 2009-01-02 5.4 4.1 2.2 5.3 7.1
#3 1 2009-01-03 4.4 2.1 4.2 6.3 10.1
#
#$`2`
# id date x1 x2 x3 x4 x5
#4 2 2009-01-01 12.4 2.7 4.9 3.3 2.1
#
#$`3`
# id date x1 x2 x3 x4 x5
#5 3 2010-01-01 3.4 1.7 4.6 4.3 6.1
#
#$`4`
# id date x1 x2 x3 x4 x5
#6 4 2009-01-01 2.4 3.7 5.6 2.3 9.1
#7 4 2009-01-02 3.4 5.7 7.6 3.3 5.1
nest