R功能过滤掉每日数字

时间:2020-10-30 18:34:59

标签: r dataframe

我有以下数据框

Fruit    Date       Price
Banana   01-01-2019 1
Banana   10-01-2019 1
Banana   31-01-2019 3
Banana   01-02-2019 4
Banana   04-03-2019 5
Banana   05-04-2019 6
Banana   30-04-2019 6
Apple    07-08-2020 7
Apple    08-09-2020 9
Apple    09-09-2020 9
Apple    20-09-2020 10
Apple    31-12-2020 11
Berries  30-01-2018 9
Berries  02-02-2018 14
Berries  07-03-2018 11
Berries  09-03-2018 10

对于一个给定的水果,我只想取一个月的数字,即:

Fruit    Date       Price
Banana   31-01-2019 3
Banana   01-02-2019 4
Banana   04-03-2019 5
Banana   30-04-2019 6
Apple    07-08-2020 7
Apple    20-09-2020 10
Apple    31-12-2020 11
Berries  30-01-2018 9
Berries  02-02-2018 14
Berries  09-03-2018 10

这将使我在每个月底获得水果的最终价格。

由于我是R语言的初学者,所以我不确定要应用哪个库/代码。

谢谢!

1 个答案:

答案 0 :(得分:1)

我们可以将'Date'转换为Date类,并按'Date'的'Fruit',yearmonth分组,并以max获取行日期

library(dplyr)
library(lubridate)
df1 %>%
    mutate(Date = dmy(Date)) %>%
    group_by(Fruit, year = year(Date), month = month(Date)) %>% 
    slice_max(Date) %>%
    ungroup %>%
    select(-year, - month)

-输出

# A tibble: 10 x 3
#   Fruit   Date       Price
#   <chr>   <date>     <int>
# 1 Apple   2020-08-07     7
# 2 Apple   2020-09-20    10
# 3 Apple   2020-12-31    11
# 4 Banana  2019-01-31     3
# 5 Banana  2019-02-01     4
# 6 Banana  2019-03-04     5
# 7 Banana  2019-04-30     6
# 8 Berries 2018-01-30     9
# 9 Berries 2018-02-02    14
#10 Berries 2018-03-09    10

或者另一个选择是data.table

library(data.table)
i1 <- setDT(df1)[, Date := mdy(Date)][, .I[which.max(Date)], 
       .(Fruit, year(Date), month(Date))]$V1
df1[i1]

数据

df1 <- structure(list(Fruit = c("Banana", "Banana", "Banana", "Banana", 
"Banana", "Banana", "Banana", "Apple", "Apple", "Apple", "Apple", 
"Apple", "Berries", "Berries", "Berries", "Berries"), Date = c("01-01-2019", 
"10-01-2019", "31-01-2019", "01-02-2019", "04-03-2019", "05-04-2019", 
"30-04-2019", "07-08-2020", "08-09-2020", "09-09-2020", "20-09-2020", 
"31-12-2020", "30-01-2018", "02-02-2018", "07-03-2018", "09-03-2018"
), Price = c(1L, 1L, 3L, 4L, 5L, 6L, 6L, 7L, 9L, 9L, 10L, 11L, 
9L, 14L, 11L, 10L)), class = "data.frame", row.names = c(NA, 
-16L))
相关问题