我有以下数据集,其中WEEK指定特定年份的一周(W03046是2003年的第46周),MSBRAND代表相应周内特定品牌的市场份额:
WEEK MSBRAND1 MSBRAND2 MSBRAND3
W03046 0.20 0.50 0.30
W03047 0.15 0.55 0.30
W03048 0.25 0.30 0.45
.... ... ... ...
我想创建以下数据集:
WEEK BRAND SHARE weekdummy1 weekdummy2 weekdummy3 branddummy1 branddummy3
1 1 0.20 1 0 0 1 0
1 2 0.50 1 0 0 0 1
1 3 0.30 1 0 0 0 0
2 1 0.15 0 1 0 1 0
2 2 0.55 0 1 0 0 1
2 3 0.30 0 1 0 0 0
3 1 0.25 0 0 1 1 0
3 2 0.30 0 0 1 0 1
3 3 0.45 0 0 1 0 0
有没有人知道如何从第一个数据集到第二个数据集?在R或Excel中。
非常感谢提前。
答案 0 :(得分:2)
这是一种方法,使用来自" reshape2"的melt
,然后从" data.table"中使用一点点额外费用:
library(data.table)
library(reshape2)
DT <- as.data.table(mydf)
DTL <- melt(DT, id.vars = "WEEK",
variable.name = "brand",
value.name = "share")
DTL[, `:=`(dummy = 1,
brand = gsub("MSBRAND", "", brand),
year = substr(WEEK, 2, 3),
week = substr(WEEK, 4, 6),
WEEK = NULL)]
DTL[, id := 1:nrow(DTL)]
setkey(DTL, id)
weekDummy <- setnames(
dcast.data.table(DTL, id ~ week, value.var = "dummy", fill = 0),
c("id", paste0("wd", seq_along(unique(DTL$week)))))
brandDummy <- setnames(
dcast.data.table(DTL, id ~ brand, value.var = "dummy", fill = 0),
c("id", paste0("bd", seq_along(unique(DTL$brand)))))
DTL[weekDummy][brandDummy]
# brand share dummy year week id wd1 wd2 wd3 bd1 bd2 bd3
# 1: 1 0.20 1 03 046 1 1 0 0 1 0 0
# 2: 1 0.15 1 03 047 2 0 1 0 1 0 0
# 3: 1 0.25 1 03 048 3 0 0 1 1 0 0
# 4: 2 0.50 1 03 046 4 1 0 0 0 1 0
# 5: 2 0.55 1 03 047 5 0 1 0 0 1 0
# 6: 2 0.30 1 03 048 6 0 0 1 0 1 0
# 7: 3 0.30 1 03 046 7 1 0 0 0 0 1
# 8: 3 0.30 1 03 047 8 0 1 0 0 0 1
# 9: 3 0.45 1 03 048 9 0 0 1 0 0 1
在&#34; data.table&#34;的更高版本中,dcast.data.table
将能够在一个步骤中处理多个演员表。目前,解决方案是为&#34;周&#34;创建虚拟变量。和&#34;品牌&#34;分开并合并它们。
您可能还希望按年和周订购结果。
答案 1 :(得分:0)
在R中,您将查看重塑功能,其中“宽”到“长”。它看起来像下面的代码(不运行)
#Let d be your data set and let d$WEEK be of type factor
d$WEEK=as.numeric(d$WEEK) #change to numeric
new.d = reshape(d,varying=names(d)[2:4],v.names="SHARE",
timevar="BRAND", times=1:3, direction="long", idvar="WEEK")
new.d=new.d[order(d.new$WEEK,d.new$BRAND,] #re-order
这应该让你入门
对于虚拟变量,请参阅Generate a dummy-variable