Question

我在R，df中有一个data.table。它看起来像

> seq <- c(200,208, 212, 215, 218, 25,28, 232, 236, 245 , 247, 248, 249,265, 276, 284,298, 2, 12, 13, 17,
          152, 154, 159, 
          66, 69, 74, 81, 88, 91, 93, 94, 95, 96)
> cashreg <- rep(c('c1', 'c2', 'c3'), c(21, 3, 10))
> df <- data.table(seq, cashreg)
> df
    seq cashreg
 1: 200      c1
 2: 208      c1
 3: 212      c1
 4: 215      c1
 5: 218      c1
 6:  25      c1
 7:  28      c1
 8: 232      c1
 9: 236      c1
10: 245      c1
11: 247      c1
12: 248      c1
13: 249      c1
14: 265      c1
15: 276      c1
16: 284      c1
17: 298      c1
18:   2      c1
19:  12      c1
20:  13      c1
21:  17      c1
22: 152      c2
23: 154      c2
24: 159      c2
25:  66      c3
26:  69      c3
27:  74      c3
28:  81      c3
29:  88      c3
30:  91      c3
31:  93      c3
32:  94      c3
33:  95      c3
34:  96      c3

我有一个用户定义的系列的最大值：

actual_maximum <- 299

我想在min(maximum_in_series , actual_maximum)之前和之后得到一个单调的序列。这里maximum_in_series是每个“cashreg”的最大值。

为了通过“cashreg”查找seq的最大值，我正在尝试使用

> df[df[,.I[which.max(seq)], by = cashreg]$V1]
   seq cashreg
1: 298      c1
2: 159      c2
3:  96      c3

我想删除这些最大值之前和之后的序列号。我正在尝试为每个cummax(seq)使用cashreg。

在cashreg c1中

For Example:，我想应用cummax（seq）直到min（max_series，actual_maximum），这是298，我想删除序列号25和28之外的那么。应计算剩余系列（2,12,13,17）的seq的最大值，在这种情况下，最大值将为17.因此我想对此部分应用cummax（seq）。

应该为每组cashreg完成这个过程。

预期输出看起来像。

    seq cashreg
 1: 200      c1
 2: 208      c1
 3: 212      c1
 4: 215      c1
 5: 218      c1
 6: 232      c1
 7: 236      c1
 8: 245      c1
 9: 247      c1
10: 248      c1
11: 249      c1
12: 265      c1
13: 276      c1
14: 284      c1
15: 298      c1
16:   2      c1
17:  12      c1
18:  13      c1
19:  17      c1
20: 152      c2
21: 154      c2
22: 159      c2
23:  66      c3
24:  69      c3
25:  74      c3
26:  81      c3
27:  88      c3
28:  91      c3
29:  93      c3
30:  94      c3
31:  95      c3
32:  96      c3

如何使用R。

中的data.table执行此操作

Answer 1

for( i in unique(df$cashreg)){
  #i <- "c1"
  cr <- i
  df_cashreg <- df[cashreg == cr,]
  df1 <- df_cashreg[1:df_cashreg[,.I[which.max(seq)]]]
  df2 <- df_cashreg[(df_cashreg[,.I[which.max(seq)]]+1) : nrow(df_cashreg)]
  df1 <- df1[, .SD[seq == cummax(seq)],cashreg]
  df2 <- df2[, .SD[seq == cummax(seq)],cashreg]
  df_combined <- rbind(df1, df2)
  if(file.exists(file.path(path2data,"temp_cashreg_clean.txt"))){
    write.table(df_combined, file.path(path2data, "temp_cashreg_clean.txt"), 
                row.names=FALSE, col.names=FALSE,append = TRUE, sep="\t", quote = FALSE)
  }else{
    write.table(df_combined, file.path(path2data, "temp_cashreg_clean.txt"), 
                row.names=FALSE, col.names=TRUE,sep="\t", quote = FALSE)
  }
}

df <- fread(file.path(path2data, "temp_cashreg_clean.txt"), colClasses = 
            c(cashreg = "character"))
df <- unique(df)
file.remove(file.path(path2data, "temp_cashreg_clean.txt"))

如何将cummax（）应用到R中列中的一系列特定值

1 个答案: