M Product Price
-------------------------
2014m1 Pepsi 55
2014m1 Coke 60
2014m2 Pepsi 55
2014m2 Coke 62
2014m3 Pepsi 55
2014m3 Coke 63
2014m4 Pepsi 55
2014m5 Pepsi 55
2014m6 Pepsi 55
2014m8 Pepsi 58
2014m9 Pepsi 58
2014m10 Pepsi 58
2014m11 Pepsi 58
2014m12 Pepsi 58
我有两个产品百事可乐和可口可乐的时间序列。我的意图是像下面的表格那样转换这张表。
M Product Price
--------------------------
2014m1 Coke 60
2014m2 Coke 62
2014m3 Coke 63
2014m4 Coke NA
2014m5 Coke NA
2014m6 Coke NA
2014m7 Coke NA
2014m8 Coke NA
2014m9 Coke NA
2014m10 Coke NA
2014m11 Coke NA
2014m12 Coke NA
2014m1 Pepsi 55
2014m2 Pepsi 55
2014m3 Pepsi 55
2014m4 Pepsi 55
2014m5 Pepsi 55
2014m6 Pepsi 55
2014m7 Pepsi 58
2014m8 Pepsi 58
2014m9 Pepsi 58
2014m10 Pepsi 58
2014m11 Pepsi 58
2014m12 Pepsi 58
在此表中,每个产品都有适当的月份和价格。那么有人可以帮助我转换此表吗?
答案 0 :(得分:2)
您可以为此使用complete
中的tidyr
。首先将M
变成要包含在数据中的所有级别的因子,然后使用complete填写产品。
my_df %>%
mutate(M = factor(M, levels = paste0(2014, "m", 1:12))) %>%
complete(M, Product)
# A tibble: 24 x 3
# M Product Price
# <fct> <chr> <int>
# 1 2014m1 Coke 60
# 2 2014m1 Pepsi 55
# 3 2014m2 Coke 62
# 4 2014m2 Pepsi 55
# 5 2014m3 Coke 63
# 6 2014m3 Pepsi 55
# 7 2014m4 Coke NA
# 8 2014m4 Pepsi 55
# 9 2014m5 Coke NA
# 10 2014m5 Pepsi 55
# ... with 14 more rows
数据
my_df <- structure(list(M = c("2014m1", "2014m1", "2014m2", "2014m2", "2014m3", "2014m3",
"2014m4", "2014m5", "2014m6", "2014m8", "2014m9", "2014m10",
"2014m11", "2014m12"),
Product = c("Pepsi", "Coke", "Pepsi", "Coke", "Pepsi", "Coke",
"Pepsi", "Pepsi", "Pepsi", "Pepsi", "Pepsi", "Pepsi",
"Pepsi", "Pepsi"),
Price = c(55L, 60L, 55L, 62L, 55L, 63L, 55L, 55L, 55L, 58L, 58L,
58L, 58L, 58L)),
class = "data.frame", row.names = c(NA, -14L))
答案 1 :(得分:1)
我们可以做的一种方法是使用所有可能的组合创建一个新的数据框,然后将其与原始数据框merge
一起
new_df <- data.frame(M = paste0(2014, "m", seq(12)),
Product = rep(unique(df$Product), each = 12))
merge(new_df, df, all.x = TRUE)
# M Product Price
#1 2014m1 Coke 60
#2 2014m1 Pepsi 55
#3 2014m10 Coke NA
#4 2014m10 Pepsi 58
#5 2014m11 Coke NA
#6 2014m11 Pepsi 58
#7 2014m12 Coke NA
#8 2014m12 Pepsi 58
#9 2014m2 Coke 62
#10 2014m2 Pepsi 55
......
df
是您的原始数据框。
答案 2 :(得分:1)
这是通过tidyr::expand
的更灵活的解决方案。您不必指定要添加的行数(在您的情况下为12),因为我们使用sub
来处理。
library(tidyverse)
my_df %>%
mutate(val = max(as.integer(sub('.*m', '', M)))) %>%
group_by(Product) %>%
expand(M = paste0('2014m', seq(val[1]))) %>%
left_join(., my_df)
给出,
# A tibble: 24 x 3 # Groups: Product [?] Product M Price <chr> <chr> <int> 1 Coke 2014m1 60 2 Coke 2014m10 NA 3 Coke 2014m11 NA 4 Coke 2014m12 NA 5 Coke 2014m2 62 6 Coke 2014m3 63 7 Coke 2014m4 NA 8 Coke 2014m5 NA 9 Coke 2014m6 NA 10 Coke 2014m7 NA # ... with 14 more rows