我有以下数据框
Fruit Date Price
Banana 01-01-2019 1
Banana 10-01-2019 1
Banana 31-01-2019 3
Banana 01-02-2019 4
Banana 04-03-2019 5
Banana 05-04-2019 6
Banana 30-04-2019 6
Apple 07-08-2020 7
Apple 08-09-2020 9
Apple 09-09-2020 9
Apple 20-09-2020 10
Apple 31-12-2020 11
Berries 30-01-2018 9
Berries 02-02-2018 14
Berries 07-03-2018 11
Berries 09-03-2018 10
对于一个给定的水果,我只想取一个月的数字,即:
Fruit Date Price
Banana 31-01-2019 3
Banana 01-02-2019 4
Banana 04-03-2019 5
Banana 30-04-2019 6
Apple 07-08-2020 7
Apple 20-09-2020 10
Apple 31-12-2020 11
Berries 30-01-2018 9
Berries 02-02-2018 14
Berries 09-03-2018 10
这将使我在每个月底获得水果的最终价格。
由于我是R语言的初学者,所以我不确定要应用哪个库/代码。
谢谢!
答案 0 :(得分:1)
我们可以将'Date'转换为Date
类,并按'Date'的'Fruit',year
,month
分组,并以max
获取行日期
library(dplyr)
library(lubridate)
df1 %>%
mutate(Date = dmy(Date)) %>%
group_by(Fruit, year = year(Date), month = month(Date)) %>%
slice_max(Date) %>%
ungroup %>%
select(-year, - month)
-输出
# A tibble: 10 x 3
# Fruit Date Price
# <chr> <date> <int>
# 1 Apple 2020-08-07 7
# 2 Apple 2020-09-20 10
# 3 Apple 2020-12-31 11
# 4 Banana 2019-01-31 3
# 5 Banana 2019-02-01 4
# 6 Banana 2019-03-04 5
# 7 Banana 2019-04-30 6
# 8 Berries 2018-01-30 9
# 9 Berries 2018-02-02 14
#10 Berries 2018-03-09 10
或者另一个选择是data.table
library(data.table)
i1 <- setDT(df1)[, Date := mdy(Date)][, .I[which.max(Date)],
.(Fruit, year(Date), month(Date))]$V1
df1[i1]
df1 <- structure(list(Fruit = c("Banana", "Banana", "Banana", "Banana",
"Banana", "Banana", "Banana", "Apple", "Apple", "Apple", "Apple",
"Apple", "Berries", "Berries", "Berries", "Berries"), Date = c("01-01-2019",
"10-01-2019", "31-01-2019", "01-02-2019", "04-03-2019", "05-04-2019",
"30-04-2019", "07-08-2020", "08-09-2020", "09-09-2020", "20-09-2020",
"31-12-2020", "30-01-2018", "02-02-2018", "07-03-2018", "09-03-2018"
), Price = c(1L, 1L, 3L, 4L, 5L, 6L, 6L, 7L, 9L, 9L, 10L, 11L,
9L, 14L, 11L, 10L)), class = "data.frame", row.names = c(NA,
-16L))