每次宽到长的多项措施

时间:2012-03-13 13:14:45

标签: r

我知道在这里已经多次询问过多长时间,但我无法弄清楚如何将以下内容转换为长格式。拍摄我甚至问了一个广泛到长期的SO反复措施。我对无法转换数据感到沮丧。我该如何转变(变量顺序无关紧要):

      id trt    work.T1   play.T1   talk.T1   total.T1    work.T2    play.T2   talk.T2  total.T2
1   x1.1 cnt 0.34434350 0.7841665 0.1079332 0.88803151 0.64836951 0.87954320 0.7233519 0.5630988
2   x1.2  tr 0.06132255 0.8426960 0.3338658 0.04685878 0.23478670 0.19711687 0.5164015 0.7617968
3   x1.3  tr 0.36897981 0.1834721 0.3241316 0.76904051 0.07629721 0.06945971 0.4118995 0.7452974
4   x1.4  tr 0.40759356 0.5285396 0.5654258 0.23022542 0.92309504 0.15733957 0.4132653 0.7078273
5   x1.5 cnt 0.91433676 0.7029476 0.2031782 0.31518412 0.14721669 0.33345678 0.7620444 0.9868082
6   x1.6  tr 0.88870525 0.9132728 0.2197045 0.28266959 0.82239037 0.18006177 0.2591765 0.4516309
7   x1.7 cnt 0.98373218 0.2591739 0.6331153 0.71319565 0.41351839 0.14648269 0.7631898 0.1182174
8   x1.8  tr 0.47719528 0.7926248 0.3525205 0.86213792 0.61252061 0.29057544 0.9824048 0.2386353
9   x1.9  tr 0.69350823 0.6144696 0.8568732 0.10632352 0.06812050 0.93606889 0.6701190 0.4705228
10 x1.10 cnt 0.42574646 0.7006205 0.9507216 0.55032776 0.90413220 0.10246047 0.5899279 0.3523231

进入这个:

      id trt time       work       play      talk      total
1   x1.1 cnt    1 0.34434350 0.78416653 0.1079332 0.88803151
2   x1.2  tr    1 0.06132255 0.84269599 0.3338658 0.04685878
3   x1.3  tr    1 0.36897981 0.18347215 0.3241316 0.76904051
4   x1.4  tr    1 0.40759356 0.52853960 0.5654258 0.23022542
5   x1.5 cnt    1 0.91433676 0.70294755 0.2031782 0.31518412
6   x1.6  tr    1 0.88870525 0.91327276 0.2197045 0.28266959
7   x1.7 cnt    1 0.98373218 0.25917392 0.6331153 0.71319565
8   x1.8  tr    1 0.47719528 0.79262477 0.3525205 0.86213792
9   x1.9  tr    1 0.69350823 0.61446955 0.8568732 0.10632352
10 x1.10 cnt    1 0.42574646 0.70062053 0.9507216 0.55032776
11  x1.1 cnt    2 0.64836951 0.87954320 0.7233519 0.56309884
12  x1.2  tr    2 0.23478670 0.19711687 0.5164015 0.76179680
13  x1.3  tr    2 0.07629722 0.06945971 0.4118995 0.74529740
14  x1.4  tr    2 0.92309504 0.15733957 0.4132653 0.70782726
15  x1.5 cnt    2 0.14721669 0.33345678 0.7620444 0.98680824
16  x1.6  tr    2 0.82239038 0.18006177 0.2591765 0.45163091
17  x1.7 cnt    2 0.41351839 0.14648269 0.7631898 0.11821741
18  x1.8  tr    2 0.61252061 0.29057544 0.9824048 0.23863532
19  x1.9  tr    2 0.06812050 0.93606889 0.6701190 0.47052276
20 x1.10 cnt    2 0.90413220 0.10246047 0.5899279 0.35232307

数据集

id <- paste('x', "1.", 1:10, sep="")
set.seed(10)
DF <- data.frame(id, trt=sample(c('cnt', 'tr'), 10, T), work.T1=runif(10),
    play.T1=runif(10), talk.T1=runif(10), total.T1=runif(10),
    work.T2=runif(10), play.T2=runif(10), talk.T2=runif(10), 
    total.T2=runif(10))

提前谢谢!

编辑:当我使用set.seed时发生了一些棘手的事情(当然我做错了)。上面的实际数据不是您使用set.seed(10)时获得的数据。我将错误留给了历史准确性,它确实不会影响人们给出的解决方案。

5 个答案:

答案 0 :(得分:9)

这非常接近,更改列的名称应该在您的技能组中:

reshape(DF, 
       varying=c(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ), 
       direction="long")

编辑:添加几乎是完全解决方案的版本:

reshape(DF, varying=list(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ), 
        v.names=c("Work", "Play", "Talk", "Total"), 
          # that was needed after changed 'varying' arg to a list to allow 'times' 
        direction="long",  
        times=1:2,        # substitutes number for T1 and T2
        timevar="times")  # to name the time col

答案 1 :(得分:7)

最简洁的方法是使用tidyr与dplyr库结合使用。

library(tidyr)
library(dplyr)
result <- DF %>%
  # transfer to 'long' format
  gather(loc, value, work.T1:total.T2) %>%
  # separate the column into location and time
  separate(loc, into = c('loc', 'time'), '\\.') %>%
  # transfer to 'short' format
  spread(loc, value) %>%
  mutate(time = as.numeric(substr(time, 2, 2))) %>%
  arrange(time)
tidyr专为使数据整洁而设计。

答案 2 :(得分:3)

奇怪的是,我似乎没有得到与你相同的数字(因为我们都使用set.seed(10)我应该这样做?)但是否则这似乎可以解决问题:

library(reshape)  #this might work with reshape2 as well, I haven't tried ...
DF2 <- melt(DF,id.vars=1:2)
## split 'activity.time' label into two separate variables
DF3 <- cbind(DF2,
             colsplit(as.character(DF2$variable),"\\.",
                      names=c("activity","time")))
## rename time, reorder factors:
DF4 <- transform(DF3,
                 time=as.numeric(gsub("^T","",time)),
                 activity=factor(activity,
                   levels=c("work","play","talk","total")),
                 id=factor(id,levels=paste("x1",1:10,sep=".")))
## reshape back to wide
DF5 <- cast(subset(DF4,select=-variable),id+trt+time~activity)
## reorder
DF6 <- with(DF5,DF5[order(time,id),])

它比@ DWin的答案更复杂,但也许(?)更通用。

答案 3 :(得分:3)

如果您真的不想在输出中的“时间”变量中输入“T”,那么您不能简单地执行以下操作吗?

names(DF) = sub("T", "", names(DF))
reshape(DF, direction="long", varying=3:10)

或者,在不更改names(DF)的情况下,您只需将sep=参数设置为包含“T”即可:

reshape(DF, direction="long", varying=3:10, sep=".T")
但是,我有点困惑。正如Ben Bolker指出a in his comment,您的“数据集代码”并未提供与您拥有的数字相同的数字。此外,DWin和我的输出完美匹配,但它与原始问题中的“输入此”输出不匹配。

我通过创建一个名为“DWin”的数据框和他的结果,以及一个名为“mine”的数据框和我的结果进行检查,然后使用DWin == mine进行比较。

您能否验证我们获得的输出实际上是您所需要的?

答案 4 :(得分:0)

另一种解决问题的方法,只需要很少的代码,但可能会更慢,

DF.1 <- DF[, 1:2]
DF.2 <- DF[, 3:6] 
DF.3 <- DF[, 7:10]

names(DF.2) <- names(DF.3) <- unlist(strsplit(names(DF.2), ".", fixed=T))[c(T,F)]
time <- rep(1:2, each=nrow(DF.1))
data.frame(rbind(DF.1, DF.1), time, rbind(DF.2, DF.3))