Question

样本数据：

df <- structure(list(Customer.ID = structure(c(1L, 2L, 3L, 4L, 1L, 
5L, 3L, 4L), .Label = c("A123", "B561", "C985", "D456", "Z893"
), class = "factor"), Month = c(1, 1, 1, 1, 2, 2, 2, 2), Score = c(12, 
16, 8, 20, 16, 15, 6, 22), Increase = c(12, 16, 8, 20, 4, 16, 
-2, 2)), .Names = c("Customer.ID", "Month", "Score", "Increase"
), row.names = c(NA, -8L), class = "data.frame")

Customer.ID Month Score Increase
     A123     1    12       12
     B561     1    16       16
     C985     1     8        8
     D456     1    20       20
     A123     2    16        4
     Z893     2    15       16
     C985     2     6       -2
     D456     2    22        2

我需要做的是获取“增加”列中的值。因此，本质上来说，请按其ID匹配第一列，然后按“月”按时间顺序将“得分”列中的差值作为“增加”值。如果不匹配，请保留该值。如何使用必要的R包完成此操作？

Answer 1

一个选择是使用use dplyr包。首先在Customer.ID上对数据分组，然后在arrange和Customer.ID上对Month进行分组。现在，您要做的就是从当前Score中减去前一个值。既然如此，OP曾提到如果找不到匹配项，则应在Score列中显示相同的Increase，因此将lag与default=0一起使用。

library(dplyr)

df %>% group_by(Customer.ID) %>%
  arrange(Customer.ID, Month) %>%
  mutate(NewIncrease = Score - lag(Score, default = 0))

# # A tibble: 8 x 5
# # Groups: Customer.ID [5]
#   Customer.ID Month Score Increase NewIncrease
#   <chr>       <int> <int>    <int>       <int>
# 1 A123            1    12       12          12
# 2 A123            2    16        4           4
# 3 B561            1    16       16          16
# 4 C985            1     8        8           8
# 5 C985            2     6      - 2         - 2
# 6 D456            1    20       20          20
# 7 D456            2    22        2           2
# 8 Z893            2    15       15          15

数据：

df <- read.table(text = 
"Customer.ID | Month| Score| Increase
A123| 1| 12| 12
B561| 1| 16| 16
C985| 1| 8| 8
D456| 1| 20| 20
A123| 2| 16| 4
Z893| 2| 15| 15
C985| 2| 6| -2
D456| 2| 22| 2",
stringsAsFactors = FALSE, header = TRUE, sep = "|")

按时间顺序匹配和减去

1 个答案: