Question

我是R的新手，我一直在努力处理我想在data.table中应用的条件。

按Order_id和Date排序的我的data.table看起来像这样。

我需要创建一个带有以下条件的flagger变量的新列：

如果hours_delta列中有超过3个连续的0，则使用flag_1标记这些行和之前的行
如果小时数为3，则连续0小于1 然后使用flag_2
如果两个之间只有一个0大于0，就像在行索引[8]中那样，那么用flag_3标记这些行
使用flag_4

这就是我希望新列之后表格的样子。

任何帮助都将不胜感激。

谢谢！

Answer 1

我认为这样的事情可能适用于你想要完成的事情。

library(dplyr)

# Create test dataframe
index <- c(0:19)
Order_id <- c(rep(001,8),rep(002,3),rep(003,4),rep(004,3),rep(005,2))
hours_delta <- c(720,552,rep(0,5),432,0,72,96,121,0,0,0,33,0,0,77,0)

df <- data.frame(index,Order_id,hours_delta)


# Start dplyr modifications
df <- df %>%
      # Group data by Order_id
      group_by(Order_id) %>%
      # Get the number of repitions of 0 for in the hours_delta field for that Order_id
      mutate(rle = ifelse(hours_delta == 0,rle(hours_delta)[[1]][rle(hours_delta)[[2]] == 0],NA),
             # Set the row above a zero sequence to the number of repetitions
             rle = ifelse(is.na(rle),lead(rle),rle)) %>%
      # ungroup the data
      ungroup() %>%
      # Set the flags based on number of repetitions
      mutate(flagger = case_when(is.na(.$rle)
                                 ~ "flag_4",
                                 .$rle == 1
                                 ~ "flag_3",
                                 (.$rle <= 3 & .$rle > 1)
                                 ~ "flag_2",
                                 .$rle > 3
                                 ~ "flag_1"
                                )
             ) %>%
     # Remove the temporary rle column
     select(-rle)

Answer 2

@Matt Jewett真的很棒！感谢您的答复。这个答案是对@oikonang上面的评论的回应，他正确地注意到，如果同一Order_id中有连续的0组，则该解决方案将不起作用。

要解决此问题，可以在上面的@Matt Jewett的代码中包含一些基本代码：

rle = unlist(sapply(1:length(rle(hours_delta)[[1]]), function(r) {
  if (rle(hours_delta)[[2]][r] == 0) {
     rep(rle(hours_delta)[[1]][r],rle(hours_delta)[[1]][r]) 
  } else {
  rep(NA, rle(hours_delta)[[1]][r]) }
}))

这应该代替

ifelse(hours_delta == 0,rle(hours_delta)[[1]][rle(hours_delta)[[2]] == 0],NA)

上面的行。

按R data.table中的后续出现次数标记行

2 个答案: