如何根据有条件的两个列删除重复项?

时间:2019-12-06 14:39:41

标签: r dataframe dplyr duplicates

我想删除一些重复项,但不是全部。在显示我正在使用的数据之后,我将进行解释。

这是我的数据框示例:

df <- data.frame("S" = c("A", "B", "C", "D", "E", "F"),
                 "D" = c("01/01/2019", "01/02/2019", "01/03/2019", "01/04/2019", "01/05/2019", "01/06/2019"),
                 "N" = c("001", "002", "003", "004", "005", "006"),
                 "R" = c("ABC1", "ABC1", "ABC2", "ABC2", "ABC2", "ABC2"),
                 "RF" = c("ABC1F", "ABC1F", "ABC2F", "ABC2F", "ABC2F", "ABC2F"),
                 "Des" = c("A", "A", "B", "B", "B", "B"),
                 "Q" = c(1, 2, 3, 4, 5, 6),
                 "U" = c(rep("A", 6)),
                 "P" = c(2, 3, 4, 4, 7, 7),
                 stringsAsFactors = FALSE)

现在我正在此数据帧上应用一些代码:

df$P <- round(as.double(df$P), digits = 2)
df <- df[order(df$R, df$P),]
df <- df %>%
  group_by(R) %>%
  mutate(price = P - min(P)) %>%
  ungroup()
df$Ecart <- df$price * as.double(df$Q)
df <- df %>%
  group_by(R) %>%
  mutate(EcartTotal = cumsum(Ecart)) %>%
  ungroup()

我期望的结果:

result <- data.frame("S" = c("A", "B", "C", "E", "F"),
                     "D" = c("01/01/2019", "01/02/2019", "01/03/2019", "01/05/2019", "01/06/2019"),
                     "N" = c("001", "002", "003", "005", "006"),
                     "R" = c("ABC1", "ABC1", "ABC2", "ABC2", "ABC2"),
                     "RF" = c("ABC1F", "ABC1F", "ABC2F", "ABC2F", "ABC2F"),
                     "Des" = c("A", "A", "B", "B", "B"),
                     "Q" = c(1, 2, 3, 5, 6),
                     "U" = c(rep("A", 5)),
                     "P" = c(2, 3, 4, 7, 7),
                     "price" = c(0, 1, 0, 3, 3),
                     "Ecart" = c(0, 2, 0, 15, 18),
                     "EcartTotal" = c(NA, 2, NA, NA, 33),
                     stringsAsFactors = FALSE)

因此,要获取此信息,我仅在列R等于0的情况下才删除列price的重复项。 如果每个EcartTotal的最大值都不等于R

,我也想用NA替换class Recaptcha3Form(FlaskForm): message = TextField(label="Message") recaptcha = Recaptcha3Field(action="TestAction", execute_on_load=True) submit = SubmitField(label="Submit") 的值

1 个答案:

答案 0 :(得分:1)

我们可以根据条件filter,然后按“ R”分组后将“ EcartTotal”的值replace更改为NA

library(dplyr)
df %>% 
   filter(!(duplicated(R) & price == 0)) %>%
   group_by(R) %>% 
   mutate(EcartTotal = replace(EcartTotal, EcartTotal != max(EcartTotal), NA))
# A tibble: 5 x 12
# Groups:   R [2]
#  S     D          N     R     RF    Des       Q U         P price Ecart EcartTotal
#  <chr> <chr>      <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>      <dbl>
#1 A     01/01/2019 001   ABC1  ABC1F A         1 A         2     0     0         NA
#2 B     01/02/2019 002   ABC1  ABC1F A         2 A         3     1     2          2
#3 C     01/03/2019 003   ABC2  ABC2F B         3 A         4     0     0         NA
#4 E     01/05/2019 005   ABC2  ABC2F B         5 A         7     3    15         NA
#5 F     01/06/2019 006   ABC2  ABC2F B         6 A         7     3    18         33

或者在filter步骤之后的group_by

df %>% 
   group_by(R) %>%
   filter(!(row_number() > 1 & price == 0)) %>%
    mutate(EcartTotal = EcartTotal * NA^(EcartTotal != max(EcartTotal)))