R中的数据帧转换

时间:2019-01-22 14:25:37

标签: r dplyr

这是我的数据框。

    df<-data.frame(
  Brand=c("Brand_1","Brand_2","Brand_3","Brand_4","Brand_4","Brand_1","Brand_4","Brand_4","Brand_1","Brand_2","Brand_3","Brand_2","Brand_3","Brand_4"),
  M=c("2014-6-1","2014-7-1","2014-8-1","2014-9-1","2014-10-1","2014-11-1","2014-12-1","2015-1-1","2014-2-1","2015-3-1","2014-4-1","2014-5-1","2014-6-1","2014-7-1"),
  Price=c(55,55,55,55,58,58,58,58,58,58,59,60,61,62),
  Quantity=c(140,150,NA,NA,NA,200,NA,NA,100,100,NA,NA,NA,100)
    )

df$M<-as.Date(df$M)


   Brand     M         Price  Quantity
------------------------------------------
1 Brand_1 2014-06-01    55      140
2 Brand_1 2014-11-01    58      200
3 Brand_1 2014-12-01    58      100
4 Brand_2 2014-07-01    55      150
5 Brand_2 2015-03-01    58      100
6 Brand_2 2014-05-01    60       NA
7 Brand_3 2014-08-01    55       NA
8 Brand_3 2014-04-01    59       NA
9 Brand_3 2014-06-01    61       NA
10 Brand_4 2014-09-01    55       NA
11 Brand_4 2014-10-01    58       NA
12 Brand_4 2014-12-01    58       NA
13 Brand_4 2015-01-01    58       NA
14 Brand_4 2014-07-01    62      100
-------------------------------------------

我想用dplyr或其他类似如下表的软件包进行更改。即,在转换后,我希望拥有如下表的表,并更改以下4项内容:

  1. 对于M列,我想在每两个变量之间扩展日期,例如,日期应在2014-06-01和2014-11-01之间扩展,如下表(其他4个变量:2014-07-01,2014-08- 01,2014-09-01,2014-10-01)
  2. 对于“价格”列,我想为每个记录重复相同的价格值
  3. 列数量与第一个表和
  4. 相同
  5. 对于“数量”列,第一个值“数量140”应像“数量1”列那样进行除法,28 = 140/5

品牌M价格数量数量1

1 Brand_1 2014-06-01    55      140       28
  Brand_1 2014-07-01    55      NA        28
  Brand_1 2014-08-01    55      NA        28
  Brand_1 2014-09-01    55      NA         28
  Brand_1 2014-10-01    55      NA        28
2 Brand_1 2014-11-01    58      200       200
3 Brand_1 2014-12-01    58      100       100
4 Brand_2 2014-07-01    55      150       150

上部表格仅是Brand_1和Brand_2的示例,并且不包括Brand_3和Brand_4。

1 个答案:

答案 0 :(得分:2)

我认为这就是您要寻找的。可能有一种更简化的方法来做到这一点,但这显示了逻辑。

library(dplyr)
library(tidyr)

首先,通过将data.frame()转换为日期并对MBrand进行排序,来稍微清理M。然后将Brand分组,并使用tidyr::complete()填写缺少的月份。

df2 <- df %>%
  mutate(M = as.Date(as.character(M))) %>%
  arrange(Brand, M) %>%
  group_by(Brand) %>%
  complete(M = seq.Date(min(M), max(M), by = '1 month'))

现在我们有一些简单的计算。通过查找没有数量的值来创建Grouping变量。数据已按M排序。对此分组,并通过取出Price组中的min(),删除NA。对Quantity1做类似的事情,但除以n(),即组大小。

df2 %>%
  ungroup() %>%
  mutate(Grouping = cumsum(if_else(is.na(Quantity),FALSE,TRUE))) %>%
  group_by(Grouping) %>%
  mutate(Price = min(Price, na.rm = T)) %>%
  mutate(Quantity1 = min(Quantity, na.rm = T) / n())

# Groups:   Grouping [6]
   Brand   M          Price Quantity Grouping Quantity1
   <fct>   <date>     <dbl>    <dbl>    <int>     <dbl>
 1 Brand_1 2014-02-01    58      100        1      25  
 2 Brand_1 2014-03-01    58       NA        1      25  
 3 Brand_1 2014-04-01    58       NA        1      25  
 4 Brand_1 2014-05-01    58       NA        1      25  
 5 Brand_1 2014-06-01    55      140        2      28  
 6 Brand_1 2014-07-01    55       NA        2      28  
 7 Brand_1 2014-08-01    55       NA        2      28  
 8 Brand_1 2014-09-01    55       NA        2      28  
 9 Brand_1 2014-10-01    55       NA        2      28  
10 Brand_1 2014-11-01    58      200        3      66.7
# ... with 23 more rows

如果需要,可以在ungroup()末尾进行select(-Grouping)删除此变量。