我看过许多线程,但找不到我想要的东西。我有一个具有多个ID和日期的数据集,如下所示。
id date code
1 2000-10-08 690
1 2000-10-08 75
1 2000-10-08 35
1 2001-01-01 315
1 2001-01-01 70
1 2008-09-05 690
1 2008-09-05 5
1 2008-09-05 60
2 2006-02-01 188
2 2006-02-01 198
2 2006-02-01 555
2 2006-02-01 690
3 2010-10-10 120
3 2010-10-10 75
3 2010-10-10 25
我不希望每个id重复日期,并且希望根据最低的代码值来选择它,所以最终会像这样:
id date code
1 2000-10-08 35
1 2001-01-01 70
1 2008-09-05 5
2 2006-02-01 188
3 2010-10-10 25
我使用了group_by函数,以便按ID和日期处理数据:
df %>%
group_by(id, date) %>%
arrange(code)
但是,我正在努力找出要使用的代码,以便现在仅保留每个id /日期组合的最小值。
有人可以帮我吗?
谢谢
答案 0 :(得分:1)
使用tidyverse
,您可以执行以下操作:
library(tidyverse)
dt %>%
group_by(id, date) %>%
summarise(code = min(code))
id date x
1 1 2000-10-08 35
2 1 2001-01-01 70
3 1 2008-09-05 5
4 2 2006-02-01 188
5 3 2010-10-10 25
答案 1 :(得分:1)
只需在代码末尾添加slice(1)
:
df %>%
group_by(id, date) %>%
arrange(code) %>%
slice(1)
## # A tibble: 5 x 3
## # Groups: id, date [5]
## id date code
## <int> <fct> <int>
## 1 1 2000-10-08 35
## 2 1 2001-01-01 70
## 3 1 2008-09-05 5
## 4 2 2006-02-01 188
## 5 3 2010-10-10 25
答案 2 :(得分:0)
data.table
方法
library( data.table )
dt[, .( min = min( code ) ), by = .( id, date )]
# id date min
# 1: 1 2000-10-08 35
# 2: 1 2001-01-01 70
# 3: 1 2008-09-05 5
# 4: 2 2006-02-01 188
# 5: 3 2010-10-10 25
样本数据
dt <- fread("id date code
1 2000-10-08 690
1 2000-10-08 75
1 2000-10-08 35
1 2001-01-01 315
1 2001-01-01 70
1 2008-09-05 690
1 2008-09-05 5
1 2008-09-05 60
2 2006-02-01 188
2 2006-02-01 198
2 2006-02-01 555
2 2006-02-01 690
3 2010-10-10 120
3 2010-10-10 75
3 2010-10-10 25", header = TRUE)
答案 3 :(得分:0)
从tidyverse
的角度来看,如果有很多列并且需要基于min
每个组的'code'的最大值,则该行最好使用filter
逻辑条件
library(tidyverse)
df %>%
group_by(id, date) %>%
filter(code == min(code))
# A tibble: 5 x 3
# Groups: id, date [5]
# id date code
# <int> <chr> <int>
#1 1 2000-10-08 35
#2 1 2001-01-01 70
#3 1 2008-09-05 5
#4 2 2006-02-01 188
#5 3 2010-10-10 25
如果每个组的min
值都有联系,并且只希望第一次出现min
'code'
df %>%
group_by(id, date) %>%
slice(which.min(code))
或者另一个选择是使用top_n
df %>%
group_by(id, date) %>%
top_n(1, -code)
使用aggregate
中的base R
aggregate(code ~ id + date, df, min)
# id date code
#1 1 2000-10-08 35
#2 1 2001-01-01 70
#3 2 2006-02-01 188
#4 1 2008-09-05 5
#5 3 2010-10-10 25
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L), date = c("2000-10-08", "2000-10-08", "2000-10-08",
"2001-01-01", "2001-01-01", "2008-09-05", "2008-09-05", "2008-09-05",
"2006-02-01", "2006-02-01", "2006-02-01", "2006-02-01", "2010-10-10",
"2010-10-10", "2010-10-10"), code = c(690L, 75L, 35L, 315L, 70L,
690L, 5L, 60L, 188L, 198L, 555L, 690L, 120L, 75L, 25L)),
class = "data.frame", row.names = c(NA, -15L))