根据条件过滤重复项

时间:2021-05-11 16:03:25

标签: r tidyverse

我有合并两个数据框的乱七八糟的结果,想根据指定的标准来决定。

数据如下(仅显示重复数据):

structure(list(date = structure(c(2347, 2347, 2347, 2347, 2347, 2347, 2347, 2347, 6962, 6962, 16442, 16442, 16442, 16442), class = "Date"),
               country = c("United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
                           "United Kingdom", "Greece", "Greece", "France", "France", "France", "France"), 
               city = c("Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Athens",  "Athens", "Paris", "Paris", "Paris", "Paris"), 
               diff_categories = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), 
               diff_num1 = c(-1, -4, 0, -3, 3, 0, -1, -4, 0, 1, 0, 12, -12, 0), 
               diff_num2 = c(NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 1, 11, -10, 0), 
               df1_id = c("df1_197606050002", "df1_197606050002", "df1_197606050003", "df1_197606050003","df1_197606050004", "df1_197606050004", "df1_197606050006", 
                          "df1_197606050006","df1_198901230001", "df1_198901230001", "df1_201501070001", "df1_201501070001","df1_201501070002", "df1_201501070002"),
               df2_id = c("df2_101", "df2_102", "df2_101", "df2_102", "df2_101", "df2_102", "df2_101", "df2_102", "df2_216", "df2_219", "df2_510",  "df2_511",  "df2_510", "df2_511")), 
          row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))

我现在只想为 df1_id 的每个实例保留一行,并根据以下条件决定哪一行(按降序排列;第一个是最重要的):

  • diff_categories 必须是 FALSE
  • diff_num1 应该尽可能小
  • diff_num2 应该尽可能小
  • 保持第一。

有人能指出如何最好地实现这个逻辑吗?

1 个答案:

答案 0 :(得分:1)

这行得通吗:

  @UseInterceptors(ClassSerializerInterceptor)
  @Get('first')
  async first(): Promise<User> {
    return new User(await this.userModel.findOne().lean());
  }