将非平衡2D数据帧转换为3D数组

时间:2018-07-21 00:22:19

标签: r dataframe matrix reshape

我有一个带有ID,日期和其他5个变量列的数据框。我想将数据帧转换为大小为(#ids,#dates,5)的3d矩阵。我知道如果所有id在数据框中具有相同的行数,我本可以使用dim函数等。但是,事实并非如此。我如何将不平衡(不确定是否正确)数据帧转换为3d矩阵,每个2d矩阵都对应一个id,并具有维度(#dates,5)。重要的是,每个2d矩阵的行数都随id而变化。

我对处理矩阵真的很不好。对此表示歉意。

   id       date   x1  x2  x3  x4   x5
1:  1 2009-01-01    5   4   2 5.5    7
2:  1 2009-01-02  5.4 4.1 2.2 5.3  7.1
3:  1 2009-01-03  4.4 2.1 4.2 6.3 10.1
4:  2 2009-01-01 12.4 2.7 4.9 3.3  2.1
5:  3 2010-01-01  3.4 1.7 4.6 4.3  6.1
6:  4 2009-01-01  2.4 3.7 5.6 2.3  9.1
7:  4 2009-01-02  3.4 5.7 7.6 3.3  5.1

对于每个id,我想创建一个2d矩阵,整体上创建一个3d数组。我需要这种格式将数据传递到keras R库。谢谢。

此致

2 个答案:

答案 0 :(得分:1)

这是一个整洁的选择:

library(tidyverse)

df <- data.frame(id = c(1L, 1L, 1L, 2L, 3L, 4L, 4L), 
                 date = as.Date(c("2009-01-01", "2009-01-02", "2009-01-03", "2009-01-01", "2010-01-01", "2009-01-01", "2009-01-02")), 
                 x1 = c(5, 5.4, 4.4, 12.4, 3.4, 2.4, 3.4), 
                 x2 = c(4, 4.1, 2.1, 2.7, 1.7, 3.7, 5.7), 
                 x3 = c(2, 2.2, 4.2, 4.9, 4.6, 5.6, 7.6), 
                 x4 = c(5.5, 5.3, 6.3, 3.3, 4.3, 2.3, 3.3), 
                 x5 = c(7, 7.1, 10.1, 2.1, 6.1, 9.1, 5.1))

a <- df %>% 
    complete(id, date, fill = map(df[3:7], ~0)) %>%    # insert missing rows; fill with 0s
    nest(-id) %>%    # collapse other columns to list column of data frames
    mutate(data = map(data, ~as.matrix(.x[-1]))) %>%    # drop dates from nested data frames and coerce each to matrix
    pull(data) %>%    # extract matrix list
    invoke(abind::abind, ., along = 3) %>%    # abind in 3rd dimension
    `dimnames<-`(list(as.character(unique(df$date)), names(df[3:7]), unique(df$id)))    # set dimnames properly

a
#> , , 1
#> 
#>             x1  x2  x3  x4   x5
#> 2009-01-01 5.0 4.0 2.0 5.5  7.0
#> 2009-01-02 5.4 4.1 2.2 5.3  7.1
#> 2009-01-03 4.4 2.1 4.2 6.3 10.1
#> 2010-01-01 0.0 0.0 0.0 0.0  0.0
#> 
#> , , 2
#> 
#>              x1  x2  x3  x4  x5
#> 2009-01-01 12.4 2.7 4.9 3.3 2.1
#> 2009-01-02  0.0 0.0 0.0 0.0 0.0
#> 2009-01-03  0.0 0.0 0.0 0.0 0.0
#> 2010-01-01  0.0 0.0 0.0 0.0 0.0
#> 
#> , , 3
#> 
#>             x1  x2  x3  x4  x5
#> 2009-01-01 0.0 0.0 0.0 0.0 0.0
#> 2009-01-02 0.0 0.0 0.0 0.0 0.0
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 3.4 1.7 4.6 4.3 6.1
#> 
#> , , 4
#> 
#>             x1  x2  x3  x4  x5
#> 2009-01-01 2.4 3.7 5.6 2.3 9.1
#> 2009-01-02 3.4 5.7 7.6 3.3 5.1
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0

答案 1 :(得分:1)

不确定我了解您的预期输出,但是建议您将data.frame分成list的{​​{1}},或者data.frame为每个{ {1}}。

选项1:nest设置

id

选项2:split ing

split(df, df$id)
#$`1`
#  id       date  x1  x2  x3  x4   x5
#1  1 2009-01-01 5.0 4.0 2.0 5.5  7.0
#2  1 2009-01-02 5.4 4.1 2.2 5.3  7.1
#3  1 2009-01-03 4.4 2.1 4.2 6.3 10.1
#
#$`2`
#  id       date   x1  x2  x3  x4  x5
#4  2 2009-01-01 12.4 2.7 4.9 3.3 2.1
#
#$`3`
#  id       date  x1  x2  x3  x4  x5
#5  3 2010-01-01 3.4 1.7 4.6 4.3 6.1
#
#$`4`
#  id       date  x1  x2  x3  x4  x5
#6  4 2009-01-01 2.4 3.7 5.6 2.3 9.1
#7  4 2009-01-02 3.4 5.7 7.6 3.3 5.1

样本数据

nest