用先前的值填充特定数量的data.table行

时间:2018-09-09 13:36:13

标签: r data.table

我现在有一个问题。 我尝试操作的列如下所示:

> DT <- data.table(Group= c("SM", NA, NA, NA, NA, NA, "GH", NA, NA, NA, NA, NA, NA, NA))
> DT
    Group
 1:    SM
 2:  <NA>
 3:  <NA>
 4:  <NA>
 5:  <NA>
 6:  <NA>
 7:    GH
 8:  <NA>
 9:  <NA>
10:  <NA>
11:  <NA>
12:  <NA>
13:  <NA>
14:  <NA>

我想用先前的值填充NA,但只填充特定数量的行,在这种情况下,仅填充4行,这意味着所需的结果是:

    Group
 1:    SM
 2:    SM
 3:    SM
 4:    SM
 5:    SM
 6:  <NA>
 7:    GH
 8:    GH
 9:    GH
10:    GH
11:    GH
12:  <NA>
13:  <NA>
14:  <NA>

我该如何实现?我尝试使用na.locf(),但是它没有执行我想要的操作。预先感谢

3 个答案:

答案 0 :(得分:3)

这是使用dplyr软件包的解决方案。

library(dplyr)
library(data.table)

# Set the threshold
threshold <- 4

DT2 <- DT %>%
  mutate(Group_ID = cumsum(!is.na(Group))) %>%
  group_by(Group_ID) %>%
  mutate(ID = row_number() - 1) %>%
  mutate(Group = ifelse(ID <= threshold, first(Group), NA_character_)) %>%
  ungroup() %>%
  select(Group)
DT2
# # A tibble: 14 x 1
#    Group
#    <chr>
#  1 SM   
#  2 SM   
#  3 SM   
#  4 SM   
#  5 SM   
#  6 NA   
#  7 GH   
#  8 GH   
#  9 GH   
# 10 GH   
# 11 GH   
# 12 NA   
# 13 NA   
# 14 NA  

答案 1 :(得分:3)

带有data.table的选项为

library(data.table)
DT[,  Group := Group[1][NA^(seq_len(.N) > 5)], cumsum(!is.na(Group))]
DT
#    Group
# 1:    SM
# 2:    SM
# 3:    SM
# 4:    SM
# 5:    SM
# 6:  <NA>
# 7:    GH
# 8:    GH
# 9:    GH
#10:    GH
#11:    GH
#12:  <NA>
#13:  <NA>
#14:  <NA>

答案 2 :(得分:2)

这是一种实现方法:

> DT[, Group := ifelse(seq_len(.N) <= 1 + 4, Group[1], Group),by = cumsum(!is.na(Group))]
> DT
    Group
 1:    SM
 2:    SM
 3:    SM
 4:    SM
 5:    SM
 6:  <NA>
 7:    GH
 8:    GH
 9:    GH
10:    GH
11:    GH
12:  <NA>
13:  <NA>
14:  <NA>