我知道在这里已经多次询问过多长时间,但我无法弄清楚如何将以下内容转换为长格式。拍摄我甚至问了一个广泛到长期的SO反复措施。我对无法转换数据感到沮丧。我该如何转变(变量顺序无关紧要):
id trt work.T1 play.T1 talk.T1 total.T1 work.T2 play.T2 talk.T2 total.T2
1 x1.1 cnt 0.34434350 0.7841665 0.1079332 0.88803151 0.64836951 0.87954320 0.7233519 0.5630988
2 x1.2 tr 0.06132255 0.8426960 0.3338658 0.04685878 0.23478670 0.19711687 0.5164015 0.7617968
3 x1.3 tr 0.36897981 0.1834721 0.3241316 0.76904051 0.07629721 0.06945971 0.4118995 0.7452974
4 x1.4 tr 0.40759356 0.5285396 0.5654258 0.23022542 0.92309504 0.15733957 0.4132653 0.7078273
5 x1.5 cnt 0.91433676 0.7029476 0.2031782 0.31518412 0.14721669 0.33345678 0.7620444 0.9868082
6 x1.6 tr 0.88870525 0.9132728 0.2197045 0.28266959 0.82239037 0.18006177 0.2591765 0.4516309
7 x1.7 cnt 0.98373218 0.2591739 0.6331153 0.71319565 0.41351839 0.14648269 0.7631898 0.1182174
8 x1.8 tr 0.47719528 0.7926248 0.3525205 0.86213792 0.61252061 0.29057544 0.9824048 0.2386353
9 x1.9 tr 0.69350823 0.6144696 0.8568732 0.10632352 0.06812050 0.93606889 0.6701190 0.4705228
10 x1.10 cnt 0.42574646 0.7006205 0.9507216 0.55032776 0.90413220 0.10246047 0.5899279 0.3523231
进入这个:
id trt time work play talk total
1 x1.1 cnt 1 0.34434350 0.78416653 0.1079332 0.88803151
2 x1.2 tr 1 0.06132255 0.84269599 0.3338658 0.04685878
3 x1.3 tr 1 0.36897981 0.18347215 0.3241316 0.76904051
4 x1.4 tr 1 0.40759356 0.52853960 0.5654258 0.23022542
5 x1.5 cnt 1 0.91433676 0.70294755 0.2031782 0.31518412
6 x1.6 tr 1 0.88870525 0.91327276 0.2197045 0.28266959
7 x1.7 cnt 1 0.98373218 0.25917392 0.6331153 0.71319565
8 x1.8 tr 1 0.47719528 0.79262477 0.3525205 0.86213792
9 x1.9 tr 1 0.69350823 0.61446955 0.8568732 0.10632352
10 x1.10 cnt 1 0.42574646 0.70062053 0.9507216 0.55032776
11 x1.1 cnt 2 0.64836951 0.87954320 0.7233519 0.56309884
12 x1.2 tr 2 0.23478670 0.19711687 0.5164015 0.76179680
13 x1.3 tr 2 0.07629722 0.06945971 0.4118995 0.74529740
14 x1.4 tr 2 0.92309504 0.15733957 0.4132653 0.70782726
15 x1.5 cnt 2 0.14721669 0.33345678 0.7620444 0.98680824
16 x1.6 tr 2 0.82239038 0.18006177 0.2591765 0.45163091
17 x1.7 cnt 2 0.41351839 0.14648269 0.7631898 0.11821741
18 x1.8 tr 2 0.61252061 0.29057544 0.9824048 0.23863532
19 x1.9 tr 2 0.06812050 0.93606889 0.6701190 0.47052276
20 x1.10 cnt 2 0.90413220 0.10246047 0.5899279 0.35232307
数据集
id <- paste('x', "1.", 1:10, sep="")
set.seed(10)
DF <- data.frame(id, trt=sample(c('cnt', 'tr'), 10, T), work.T1=runif(10),
play.T1=runif(10), talk.T1=runif(10), total.T1=runif(10),
work.T2=runif(10), play.T2=runif(10), talk.T2=runif(10),
total.T2=runif(10))
提前谢谢!
编辑:当我使用set.seed
时发生了一些棘手的事情(当然我做错了)。上面的实际数据不是您使用set.seed(10)
时获得的数据。我将错误留给了历史准确性,它确实不会影响人们给出的解决方案。
答案 0 :(得分:9)
这非常接近,更改列的名称应该在您的技能组中:
reshape(DF,
varying=c(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ),
direction="long")
编辑:添加几乎是完全解决方案的版本:
reshape(DF, varying=list(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ),
v.names=c("Work", "Play", "Talk", "Total"),
# that was needed after changed 'varying' arg to a list to allow 'times'
direction="long",
times=1:2, # substitutes number for T1 and T2
timevar="times") # to name the time col
答案 1 :(得分:7)
最简洁的方法是使用tidyr与dplyr库结合使用。
library(tidyr)
library(dplyr)
result <- DF %>%
# transfer to 'long' format
gather(loc, value, work.T1:total.T2) %>%
# separate the column into location and time
separate(loc, into = c('loc', 'time'), '\\.') %>%
# transfer to 'short' format
spread(loc, value) %>%
mutate(time = as.numeric(substr(time, 2, 2))) %>%
arrange(time)
tidyr专为使数据整洁而设计。
答案 2 :(得分:3)
奇怪的是,我似乎没有得到与你相同的数字(因为我们都使用set.seed(10)
我应该这样做?)但是否则这似乎可以解决问题:
library(reshape) #this might work with reshape2 as well, I haven't tried ...
DF2 <- melt(DF,id.vars=1:2)
## split 'activity.time' label into two separate variables
DF3 <- cbind(DF2,
colsplit(as.character(DF2$variable),"\\.",
names=c("activity","time")))
## rename time, reorder factors:
DF4 <- transform(DF3,
time=as.numeric(gsub("^T","",time)),
activity=factor(activity,
levels=c("work","play","talk","total")),
id=factor(id,levels=paste("x1",1:10,sep=".")))
## reshape back to wide
DF5 <- cast(subset(DF4,select=-variable),id+trt+time~activity)
## reorder
DF6 <- with(DF5,DF5[order(time,id),])
它比@ DWin的答案更复杂,但也许(?)更通用。
答案 3 :(得分:3)
如果您真的不想在输出中的“时间”变量中输入“T”,那么您不能简单地执行以下操作吗?
names(DF) = sub("T", "", names(DF))
reshape(DF, direction="long", varying=3:10)
或者,在不更改names(DF)
的情况下,您只需将sep=
参数设置为包含“T”即可:
reshape(DF, direction="long", varying=3:10, sep=".T")
但是,我有点困惑。正如Ben Bolker指出a in his comment,您的“数据集代码”并未提供与您拥有的数字相同的数字。此外,DWin和我的输出完美匹配,但它与原始问题中的“输入此”输出不匹配。
我通过创建一个名为“DWin”的数据框和他的结果,以及一个名为“mine”的数据框和我的结果进行检查,然后使用DWin == mine
进行比较。
您能否验证我们获得的输出实际上是您所需要的?
答案 4 :(得分:0)
另一种解决问题的方法,只需要很少的代码,但可能会更慢,
DF.1 <- DF[, 1:2]
DF.2 <- DF[, 3:6]
DF.3 <- DF[, 7:10]
names(DF.2) <- names(DF.3) <- unlist(strsplit(names(DF.2), ".", fixed=T))[c(T,F)]
time <- rep(1:2, each=nrow(DF.1))
data.frame(rbind(DF.1, DF.1), time, rbind(DF.2, DF.3))