我有以下数据,我想将其转换为长格式。
id count a1 b1 c1 a2 b2 c2 a3 b3 c3 age
1 1 apple 2 3 orange 3 2 beer 2 1 50
1 2 orange 3 2 apple 2 2 beer 2 1 50
2 1 pear 3 2 apple 2 2 orange 2 2 45
[a1,b1,c1],[a2,b2,c2],[a3,b3,c3]是具有指定身份的人所面对的三个属性的集合,此人可能会面临多项选择情况表明第i个选择情况。我想将其更改为长格式,同时保留其他变量,如下所示:
id count a b c age
1 1 apple 2 3 50
1 1 orange 3 2 50
1 1 beer 2 1 50
1 2 orange 3 2 50
1 2 apple 2 2 50
1 2 beer 2 1 50
2 1 pear 3 2 45
2 1 apple 2 2 45
2 1 orange 2 2 45
我尝试过使用以下命令进行重塑,但是我对在时间和时间的处理方面感到困惑:
l <- reshape(df,
varying = df[,3:11],
v.names = c("a","b","c"),
timevar = "choice",
times = c("a","b","c"),
direction = "long")
用上面的命令,我不能得到我想要的结果,真诚地感谢任何帮助!
答案 0 :(得分:4)
使用melt
包中的data.table
功能:
library(data.table)
setDT(df)
melt(df, id.vars = c('id', 'count', 'age'),
measure = patterns('a\\d', 'b\\d', 'c\\d'),
# this needs to be regular expression to group `a1, a2, a3` etc together and
# the `\\d` is necessary because you have an age variable in the column.
value.name = c('a', 'b', 'c'))[, variable := NULL][order(id, count, -age)]
# id count age a b c
# 1: 1 1 50 apple 2 3
# 2: 1 1 50 orange 3 2
# 3: 1 1 50 beer 2 1
# 4: 1 2 50 orange 3 2
# 5: 1 2 50 apple 2 2
# 6: 1 2 50 beer 2 1
# 7: 2 1 45 pear 3 2
# 8: 2 1 45 apple 2 2
# 9: 2 1 45 orange 2 2
答案 1 :(得分:3)
我们可以使用dplyr/tidyr
library(dplyr)
library(tidyr)
gather(df1, Var, Val, a1:c3) %>%
extract(Var, into = c("Var1", "Var2"), "(.)(.)") %>%
spread(Var1, Val) %>%
select(-Var2)
# id count age a b c
#1 1 1 50 apple 2 3
#2 1 1 50 orange 3 2
#3 1 1 50 beer 2 1
#4 1 2 50 orange 3 2
#5 1 2 50 apple 2 2
#6 1 2 50 beer 2 1
#7 2 1 45 pear 3 2
#8 2 1 45 apple 2 2
#9 2 1 45 orange 2 2
答案 2 :(得分:2)
要使用reshape
功能,您只需调整变化参数即可。它可以是一个列表,您希望将构成同一列的变量放在列表中的向量中:
reshape(df,
idvar=c("id", "count", "age"),
varying = list(c(3,6,9), c(4,7,10), c(5,8,11)),
timevar="time",
v.names=c("a", "b", "c"),
direction = "long")
返回
id count age time a b c
1.1.50.1 1 1 50 1 apple 2 3
1.2.50.1 1 2 50 1 orange 3 2
2.1.45.1 2 1 45 1 pear 3 2
1.1.50.2 1 1 50 2 orange 3 2
1.2.50.2 1 2 50 2 apple 2 2
2.1.45.2 2 1 45 2 apple 2 2
1.1.50.3 1 1 50 3 beer 2 1
1.2.50.3 1 2 50 3 beer 2 1
2.1.45.3 2 1 45 3 orange 2 2
我还在idvars中添加了内容,因为我认为这通常是其他人的好习惯或重新阅读旧代码。
数据强>
df <- read.table(header=T, text="id count a1 b1 c1 a2 b2 c2 a3 b3 c3 age
1 1 apple 2 3 orange 3 2 beer 2 1 50
1 2 orange 3 2 apple 2 2 beer 2 1 50
2 1 pear 3 2 apple 2 2 orange 2 2 45")