这是我的数据框。
df<-data.frame(
Brand=c("Brand_1","Brand_2","Brand_3","Brand_4","Brand_4","Brand_1","Brand_4","Brand_4","Brand_1","Brand_2","Brand_3","Brand_2","Brand_3","Brand_4"),
M=c("2014-6-1","2014-7-1","2014-8-1","2014-9-1","2014-10-1","2014-11-1","2014-12-1","2015-1-1","2014-2-1","2015-3-1","2014-4-1","2014-5-1","2014-6-1","2014-7-1"),
Price=c(55,55,55,55,58,58,58,58,58,58,59,60,61,62),
Quantity=c(140,150,NA,NA,NA,200,NA,NA,100,100,NA,NA,NA,100)
)
df$M<-as.Date(df$M)
Brand M Price Quantity
------------------------------------------
1 Brand_1 2014-06-01 55 140
2 Brand_1 2014-11-01 58 200
3 Brand_1 2014-12-01 58 100
4 Brand_2 2014-07-01 55 150
5 Brand_2 2015-03-01 58 100
6 Brand_2 2014-05-01 60 NA
7 Brand_3 2014-08-01 55 NA
8 Brand_3 2014-04-01 59 NA
9 Brand_3 2014-06-01 61 NA
10 Brand_4 2014-09-01 55 NA
11 Brand_4 2014-10-01 58 NA
12 Brand_4 2014-12-01 58 NA
13 Brand_4 2015-01-01 58 NA
14 Brand_4 2014-07-01 62 100
-------------------------------------------
我想用dplyr或其他类似如下表的软件包进行更改。即,在转换后,我希望拥有如下表的表,并更改以下4项内容:
1 Brand_1 2014-06-01 55 140 28
Brand_1 2014-07-01 55 NA 28
Brand_1 2014-08-01 55 NA 28
Brand_1 2014-09-01 55 NA 28
Brand_1 2014-10-01 55 NA 28
2 Brand_1 2014-11-01 58 200 200
3 Brand_1 2014-12-01 58 100 100
4 Brand_2 2014-07-01 55 150 150
上部表格仅是Brand_1和Brand_2的示例,并且不包括Brand_3和Brand_4。
答案 0 :(得分:2)
我认为这就是您要寻找的。可能有一种更简化的方法来做到这一点,但这显示了逻辑。
library(dplyr)
library(tidyr)
首先,通过将data.frame()
转换为日期并对M
和Brand
进行排序,来稍微清理M
。然后将Brand
分组,并使用tidyr::complete()
填写缺少的月份。
df2 <- df %>%
mutate(M = as.Date(as.character(M))) %>%
arrange(Brand, M) %>%
group_by(Brand) %>%
complete(M = seq.Date(min(M), max(M), by = '1 month'))
现在我们有一些简单的计算。通过查找没有数量的值来创建Grouping
变量。数据已按M
排序。对此分组,并通过取出Price
组中的min()
,删除NA。对Quantity1
做类似的事情,但除以n()
,即组大小。
df2 %>%
ungroup() %>%
mutate(Grouping = cumsum(if_else(is.na(Quantity),FALSE,TRUE))) %>%
group_by(Grouping) %>%
mutate(Price = min(Price, na.rm = T)) %>%
mutate(Quantity1 = min(Quantity, na.rm = T) / n())
# Groups: Grouping [6]
Brand M Price Quantity Grouping Quantity1
<fct> <date> <dbl> <dbl> <int> <dbl>
1 Brand_1 2014-02-01 58 100 1 25
2 Brand_1 2014-03-01 58 NA 1 25
3 Brand_1 2014-04-01 58 NA 1 25
4 Brand_1 2014-05-01 58 NA 1 25
5 Brand_1 2014-06-01 55 140 2 28
6 Brand_1 2014-07-01 55 NA 2 28
7 Brand_1 2014-08-01 55 NA 2 28
8 Brand_1 2014-09-01 55 NA 2 28
9 Brand_1 2014-10-01 55 NA 2 28
10 Brand_1 2014-11-01 58 200 3 66.7
# ... with 23 more rows
如果需要,可以在ungroup()
末尾进行select(-Grouping)
删除此变量。