在条件数据框中插入新行

时间:2018-04-05 10:48:46

标签: r dplyr data.table rows

这是我的数据:

df <- data.table(loc.id = c(22,22,23,23,23,24,24,24,25,25,25,27,27,27,27),
                 month = sample(c(1:12), 15, replace = TRUE))

    loc.id month
 1:     22     1
 2:     22     4
 3:     23    12
 4:     23    10
 5:     23     7
 6:     24     4
 7:     24     3
 8:     24    11
 9:     25     2
10:     25     3
11:     25     4
12:     27     1
13:     27     5
14:     27    12
15:     27     1

对于每个loc.id,我有几个月的时间。我想要做的是在df中插入新行。对于每个loc.id,我想要插入另外两行:min(month) - 1中一行的月值等于df,另一行的月值等于max(month) + 1

我的最终数据应如下所示:

    loc.id month
 1:     22     1
 2:     22     4
 3:     22     0 # min(month) - 1
 4:     22     5 # max(month) + 1

 5:     23    12
 6:     23    10
 7:     23     7
 8:     23     6 # min(month) - 1
 9:     23     13 # max(month) + 1

10:     24     4
11:     24     3
12:     24    11
13:     24     2 # min(month) - 1
14:     24    12 # max(month) + 1
 .
 .

我已设法将这些添加为列,但我需要将它们添加为行

  df %>%
    group_by(loc.id) %>%
    mutate(month.min = min(month) - 1,
           month.max = max(month) + 1)

2 个答案:

答案 0 :(得分:1)

使用data.table

dfmm <- df[, .(min.month = min(month) - 1, max.month = max(month) + 1), by = loc.id
           ][, melt(.SD, id = 1)][, .(loc.id, month = value)]

rbindlist(list(df, dfmm))

或@Frank在评论中建议的较短选项:

df[, rbind(.SD, .(range(month) + c(-1,1))), by = loc.id]

给出:

    loc.id month
 1:     22     5
 2:     22     6
 3:     23     1
 4:     23     3
 5:     23     6
 6:     24     4
 7:     24     8
 8:     24     2
 9:     25    12
10:     25     7
11:     25     5
12:     27     8
13:     27    12
14:     27     9
15:     27    10
16:     22     4
17:     23     0
18:     24     1
19:     25     4
20:     27     7
21:     22     7
22:     23     7
23:     24     9
24:     25    13
25:     27    13

如果您想订购它们,您可以这样做:

rbindlist(list(df, dfmm))[order(loc.id)]

或使用dplyrtidyr

library(dplyr)
library(tidyr)
df %>% 
  group_by(loc.id) %>% 
  summarise(min.month = min(month) - 1,
            max.month = max(month) + 1) %>% 
  gather(key, val, -1) %>% 
  select(loc.id, month = val) %>% 
  bind_rows(df, .)

或(受data.table - @Frank方法的启发):

df %>% 
  group_by(loc.id) %>% 
  do(data.frame(month = range(.$month) + c(-1,1))) %>% 
  bind_rows(df, .)

使用过的数据:

library(data.table)
set.seed(2018)
df <- data.table(loc.id = c(22,22,23,23,23,24,24,24,25,25,25,27,27,27,27),
                 month = sample(c(1:12), 15, replace = TRUE))

答案 1 :(得分:0)

您可以使用do(...)add_row(...)

的组合
library(tidyverse)
df %>% 
  group_by(loc.id) %>% 
  do(add_row(., loc.id = rep(unique(.$loc.id), 2),
                month = c(min(.$month) - 1, max(.$month) + 1))) %>%
  ungroup()

# A tibble: 25 x 2
   # loc.id month
    # <dbl> <dbl>
 # 1    22.    7.
 # 2    22.    2.
 # 3    22.    1.
 # 4    22.    8.
 # 5    23.    5.
 # 6    23.    7.
 # 7    23.    7.
 # 8    23.    4.
 # 9    23.    8.
# 10    24.    9.
# ... with 15 more rows