我有以下样本数据集:
count date
1 11/25/16
2 11/29/16
3 11/30/16
4 12/4/16
5 12/7/16
6 12/8/16
7 12/9/16
8 12/10/16
9 12/11/16
10 12/12/16
11 12/13/16
12 12/17/16
13 12/17/16
14 12/18/16
15 12/19/16
16 12/20/16
17 12/20/16
18 12/20/16
19 12/20/16
20 12/20/16
21 12/21/16
22 12/21/16
23 12/21/16
24 12/21/16
25 12/21/16
26 12/22/16
27 12/22/16
28 12/22/16
29 12/22/16
30 12/22/16
31 12/23/16
32 12/23/16
33 12/23/16
34 12/23/16
35 12/23/16
36 12/23/16
我想计算每个日期的最大数量。请注意,最终值并不总是我真实数据集中的最大值。理想情况下,我的输出不应该有任何重复。
答案 0 :(得分:1)
以dplyr
方式执行此操作:
library(dplyr)
library(lubridate)
#Sample data set
set.seed(123)
df <- tibble(x = sample(1:10,20,replace = T),
y = sample(ymd("2018-01-01") + days(0:5),20,replace = T)) %>%
arrange(y)
df %>%
group_by(y) %>%
dplyr::filter(x == max(x)) %>%
distinct(x,.keep_all = T) %>%
ungroup()
结果:
# A tibble: 6 x 2
x y
<int> <date>
1 5 2018-01-01
2 10 2018-01-02
3 9 2018-01-03
4 10 2018-01-04
5 8 2018-01-05
6 10 2018-01-06
答案 1 :(得分:0)
order
您的数据框,然后我们按日期列
duplicated
dt=dt[order(-dt$count),]
dt=dt[!duplicated(dt$date),]
dt
count date
36 36 12/23/16
30 30 12/22/16
25 25 12/21/16
20 20 12/20/16
15 15 12/19/16
14 14 12/18/16
13 13 12/17/16
11 11 12/13/16
10 10 12/12/16
9 9 12/11/16
8 8 12/10/16
7 7 12/9/16
6 6 12/8/16
5 5 12/7/16
4 4 12/4/16
3 3 11/30/16
2 2 11/29/16
1 1 11/25/16
答案 2 :(得分:0)
您还可以使用which.max()
和group_by()
功能:
library(lubridate)
library(dplyr)
df %>%
group_by(date) %>%
summarise(MaxCount = count[which.max(count)])
结果:
# A tibble: 6 x 2
date MaxCount
<date> <int>
1 2018-01-01 5
2 2018-01-02 10
3 2018-01-03 9
4 2018-01-04 10
5 2018-01-05 8
6 2018-01-06 10
样本数据集(由@Yifu Yan编辑):
set.seed(123)
df <- tibble(count = sample(1:10,20,replace = T),
date = sample(ymd("2018-01-01") + days(0:5),20,replace = T)) %>%
arrange(date)