想要以一种有助于我为时间序列分析创建数据集的方式重塑下面结构的数据集
下面的数据集就是一个例子,我有多个变量作为列,多个品牌作为行以及它们各自的时间段
Brand Period V1 V2 V3
A Week1 1 2 3
A Week2 1 2 3
A Week3 1 2 3
B Week1 1 2 3
B Week2 1 2 3
B Week3 1 2 3
C Week1 1 2 3
C Week2 1 2 3
C Week3 1 2 3
数据集如下所示:
Period A_V1 A_V2 A_V3 B_V1 B_V2 B_V3 C_V1 C_V2 C_V3
Week1
Week2
Week3
想知道reshape包中是否有某些功能或我可以使用的任何其他包
答案 0 :(得分:2)
基本操作可以在一个read.zoo调用中完成,该调用将:
结果是动物园系列z
。该系列可以直接以该形式进行操作,也可以使用fortify.zoo(z)
或ts系列将索引转换为数字(如后所示),然后使用as.ts(z)
来转换为数据框。
library(zoo)
# z <- read.zoo(brands, index = 2, split = 1, FUN = as.character, header = TRUE)
z <- read.zoo("brands.dat", index = 2, split = 1, FUN = as.character, header = TRUE)
,并提供:
V1.A V2.A V3.A V1.B V2.B V3.B V1.C V2.C V3.C
Week1 1 2 3 1 2 3 1 2 3
Week2 1 2 3 1 2 3 1 2 3
Week3 1 2 3 1 2 3 1 2 3
如果您希望完全按照问题中显示的形式使用列名,请添加以下内容:
colnames(z) <- sub("(\\w+)[.](\\w+)", "\\2_\\1", colnames(z))
如果您更喜欢数字时间索引或想要将其转换为ts
系列(需要这样),请添加:
time(z) <- 1:nrow(z)
或者这个:
time(z) <- as.numeric(gsub("\\D", "", time(z))
注意:这会生成输入文件:
Lines <- "
Brand Period V1 V2 V3
A Week1 1 2 3
A Week2 1 2 3
A Week3 1 2 3
B Week1 1 2 3
B Week2 1 2 3
B Week3 1 2 3
C Week1 1 2 3
C Week2 1 2 3
C Week3 1 2 3"
cat(Lines, file = "brands.dat")
或者如果您的起点是数据框,那么:
brands <- read.table(text = Lines, header = TRUE)
答案 1 :(得分:1)
我们可以使用dcast
中data.table
value.var
个library(data.table)
dcast(setDT(df1), Period ~ Brand, value.var =names(df1)[3:5])
# Period V1_A V1_B V1_C V2_A V2_B V2_C V3_A V3_B V3_C
#1: Week1 1 1 1 2 2 2 3 3 3
#2: Week2 1 1 1 2 2 2 3 3 3
#3: Week3 1 1 1 2 2 2 3 3 3
列
$ ANSIBLE_STDOUT_CALLBACK=oneline ansible-playbook myplaybook.yml
$ ANSIBLE_STDOUT_CALLBACK=minimal ansible-playbook myplaybook.yml
答案 2 :(得分:0)
如果您已经习惯tidyverse
,则可以使用gather
spread
的{{1}}组合(类似于this回答):
tidyr
结果:
Brand <- c(rep("A", 3), rep("B", 3), rep("C", 3))
Period <- c(rep(c("Week1", "Week2", "Week3"), 3))
V1 <- c(rep(1, 9))
V2 <- c(rep(2, 9))
V3 <- c(rep(3, 9))
df <- data.frame(cbind(Brand, Period, V1, V2, V3))
df %>%
gather(vars, value, -Brand, -Period) %>%
mutate(observation = paste(Brand, vars, sep="_")) %>%
select(-Brand, -vars) %>%
spread(observation, value)