我有桌子:
Date | Column1 | Column2
------+---------+--------
6/1/1 | A | 3
5/1/1 | B | 4
4/1/1 | C | 5
1/1/1 | A | 1
7/1/1 | B | 2
1/1/1 | C | 3
我需要表格:
Date | Column1 | Column2
------+---------+--------
6/1/1 | A | 3
4/1/1 | C | 5
7/1/1 | B | 2
如何根据两个条件(Column1
,Column2
)删除旧行?
答案 0 :(得分:0)
按日期分组,在组内按降序排列,然后将第一行保留为slice
,就像这样
library(dplyr)
ans <- df %>%
group_by(Column1, Column2) %>%
arrange(desc(as.Date(Date))) %>% # will sort within group now
slice(1) %>% # keep first row entry of each group
ungroup()
您的错误正在发生,因为您的日期格式有点滑稽。我建议使用比基本R日期时间函数
更强大的lubridate::parse_date_time
library(lubridate)
library(dplyr)
ans <- df %>%
group_by(Column1, Column2) %>%
arrange(desc(parse_date_time(Date, format="mdy"))) %>% # will sort within group now
# the date format is specified as month-day-year
slice(1) %>% # keep first row entry of each group
ungroup()
修改强>
基于@count的有用评论,我们可以将dplyr链简化为
library(lubridate)
library(dplyr)
ans <- df %>%
group_by(Column1, Column2) %>%
slice(which.max(parse_date_time(Date, format="mdy"))) %>% # keep max-Date row entry of each group
ungroup()