R:将数据从宽转换为长 - 多个条件 - 获取错误

时间:2016-07-19 01:56:38

标签: r data-manipulation

我有以下数据,我想将其转换为长格式。

id count a1     b1 c1   a2     b2 c2  a3    b3  c3  age
1  1     apple  2  3    orange 3  2   beer   2   1   50
1  2     orange 3  2    apple  2  2   beer   2   1   50
2  1     pear   3  2    apple  2  2   orange 2   2   45

[a1,b1,c1],[a2,b2,c2],[a3,b3,c3]是具有指定身份的人所面对的三个属性的集合,此人可能会面临多项选择情况表明第i个选择情况。我想将其更改为长格式,同时保留其他变量,如下所示:

id count    a    b  c  age
1  1      apple  2  3  50
1  1      orange 3  2  50
1  1      beer   2  1  50
1  2      orange 3  2  50
1  2      apple  2  2  50
1  2      beer   2  1  50
2  1      pear   3  2  45
2  1      apple  2  2  45
2  1      orange 2  2  45

我尝试过使用以下命令进行重塑,但是我对在时间和时间的处理方面感到困惑:

 l <- reshape(df, 
           varying = df[,3:11],
           v.names = c("a","b","c"),
           timevar = "choice", 
           times = c("a","b","c"), 
           direction = "long")

用上面的命令,我不能得到我想要的结果,真诚地感谢任何帮助!

3 个答案:

答案 0 :(得分:4)

使用melt包中的data.table功能:

library(data.table)
setDT(df)
melt(df, id.vars = c('id', 'count', 'age'),  
         measure = patterns('a\\d', 'b\\d', 'c\\d'), 
         # this needs to be regular expression to group `a1, a2, a3` etc together and 
         # the `\\d` is necessary because you have an age variable in the column.
         value.name = c('a', 'b', 'c'))[, variable := NULL][order(id, count, -age)]

#    id count age      a b c
# 1:  1     1  50  apple 2 3
# 2:  1     1  50 orange 3 2
# 3:  1     1  50   beer 2 1
# 4:  1     2  50 orange 3 2
# 5:  1     2  50  apple 2 2
# 6:  1     2  50   beer 2 1
# 7:  2     1  45   pear 3 2
# 8:  2     1  45  apple 2 2
# 9:  2     1  45 orange 2 2

答案 1 :(得分:3)

我们可以使用dplyr/tidyr

library(dplyr)
library(tidyr)
gather(df1, Var, Val, a1:c3) %>%
       extract(Var, into = c("Var1", "Var2"), "(.)(.)") %>%
       spread(Var1, Val) %>%
       select(-Var2)
#   id count age      a b c
#1  1     1  50  apple 2 3
#2  1     1  50 orange 3 2
#3  1     1  50   beer 2 1
#4  1     2  50 orange 3 2
#5  1     2  50  apple 2 2
#6  1     2  50   beer 2 1
#7  2     1  45   pear 3 2
#8  2     1  45  apple 2 2
#9  2     1  45 orange 2 2

答案 2 :(得分:2)

要使用reshape功能,您只需调整变化参数即可。它可以是一个列表,您希望将构成同一列的变量放在列表中的向量中:

reshape(df, 
        idvar=c("id", "count", "age"),
        varying = list(c(3,6,9), c(4,7,10), c(5,8,11)),
        timevar="time",
        v.names=c("a", "b", "c"), 
        direction = "long")

返回

         id count age time      a b c
1.1.50.1  1     1  50    1  apple 2 3
1.2.50.1  1     2  50    1 orange 3 2
2.1.45.1  2     1  45    1   pear 3 2
1.1.50.2  1     1  50    2 orange 3 2
1.2.50.2  1     2  50    2  apple 2 2
2.1.45.2  2     1  45    2  apple 2 2
1.1.50.3  1     1  50    3   beer 2 1
1.2.50.3  1     2  50    3   beer 2 1
2.1.45.3  2     1  45    3 orange 2 2

我还在idvars中添加了内容,因为我认为这通常是其他人的好习惯或重新阅读旧代码。

数据

df <- read.table(header=T, text="id count a1     b1 c1   a2     b2 c2  a3    b3  c3  age
1  1     apple  2  3    orange 3  2   beer   2   1   50
1  2     orange 3  2    apple  2  2   beer   2   1   50
2  1     pear   3  2    apple  2  2   orange 2   2   45")