转换数据框

时间:2018-09-07 05:44:33

标签: r dataframe dplyr

M     Product   Price
-------------------------
2014m1  Pepsi   55
2014m1  Coke    60
2014m2  Pepsi   55
2014m2  Coke    62
2014m3  Pepsi   55
2014m3  Coke    63
2014m4  Pepsi   55
2014m5  Pepsi   55
2014m6  Pepsi   55
2014m8  Pepsi   58
2014m9  Pepsi   58
2014m10 Pepsi   58
2014m11 Pepsi   58
2014m12 Pepsi   58

我有两个产品百事可乐和可口可乐的时间序列。我的意图是像下面的表格那样转换这张表。

M     Product Price
--------------------------
2014m1  Coke    60
2014m2  Coke    62
2014m3  Coke    63
2014m4  Coke    NA
2014m5  Coke    NA
2014m6  Coke    NA
2014m7  Coke    NA
2014m8  Coke    NA
2014m9  Coke    NA
2014m10 Coke    NA
2014m11 Coke    NA
2014m12 Coke    NA
2014m1  Pepsi   55
2014m2  Pepsi   55
2014m3  Pepsi   55
2014m4  Pepsi   55
2014m5  Pepsi   55
2014m6  Pepsi   55
2014m7  Pepsi   58
2014m8  Pepsi   58
2014m9  Pepsi   58
2014m10 Pepsi   58
2014m11 Pepsi   58
2014m12 Pepsi   58

在此表中,每个产品都有适当的月份和价格。那么有人可以帮助我转换此表吗?

3 个答案:

答案 0 :(得分:2)

您可以为此使用complete中的tidyr。首先将M变成要包含在数据中的所有级别的因子,然后使用complete填写产品。

my_df %>% 
  mutate(M = factor(M, levels = paste0(2014, "m", 1:12))) %>%
  complete(M, Product)

# A tibble: 24 x 3
#    M      Product Price
#    <fct>  <chr>   <int>
#  1 2014m1 Coke       60
#  2 2014m1 Pepsi      55
#  3 2014m2 Coke       62
#  4 2014m2 Pepsi      55
#  5 2014m3 Coke       63
#  6 2014m3 Pepsi      55
#  7 2014m4 Coke       NA
#  8 2014m4 Pepsi      55
#  9 2014m5 Coke       NA
# 10 2014m5 Pepsi      55
# ... with 14 more rows

数据

my_df <- structure(list(M = c("2014m1", "2014m1", "2014m2", "2014m2", "2014m3", "2014m3", 
                     "2014m4", "2014m5", "2014m6", "2014m8", "2014m9", "2014m10", 
                     "2014m11", "2014m12"), 
               Product = c("Pepsi", "Coke", "Pepsi", "Coke", "Pepsi", "Coke", 
                           "Pepsi", "Pepsi", "Pepsi", "Pepsi", "Pepsi", "Pepsi",
                           "Pepsi", "Pepsi"), 
               Price = c(55L, 60L, 55L, 62L, 55L, 63L, 55L, 55L, 55L, 58L, 58L, 
                         58L, 58L, 58L)), 
          class = "data.frame", row.names = c(NA, -14L))

答案 1 :(得分:1)

我们可以做的一种方法是使用所有可能的组合创建一个新的数据框,然后将其与原始数据框merge一起

new_df <- data.frame(M = paste0(2014, "m", seq(12)), 
         Product = rep(unique(df$Product), each = 12))

merge(new_df, df, all.x = TRUE)


#         M  Product Price
#1   2014m1    Coke    60
#2   2014m1   Pepsi    55
#3   2014m10   Coke    NA
#4   2014m10  Pepsi    58
#5   2014m11   Coke    NA
#6   2014m11  Pepsi    58
#7   2014m12   Coke    NA
#8   2014m12  Pepsi    58
#9   2014m2    Coke    62
#10  2014m2   Pepsi    55
......

df是您的原始数据框。

答案 2 :(得分:1)

这是通过tidyr::expand的更灵活的解决方案。您不必指定要添加的行数(在您的情况下为12),因为我们使用sub来处理。

library(tidyverse)

my_df %>% 
 mutate(val = max(as.integer(sub('.*m', '', M)))) %>% 
 group_by(Product) %>% 
 expand(M = paste0('2014m', seq(val[1]))) %>% 
 left_join(., my_df)

给出,

# A tibble: 24 x 3
# Groups:   Product [?]
   Product M       Price
   <chr>   <chr>   <int>
 1 Coke    2014m1     60
 2 Coke    2014m10    NA
 3 Coke    2014m11    NA
 4 Coke    2014m12    NA
 5 Coke    2014m2     62
 6 Coke    2014m3     63
 7 Coke    2014m4     NA
 8 Coke    2014m5     NA
 9 Coke    2014m6     NA
10 Coke    2014m7     NA
# ... with 14 more rows