我对R很陌生,所以请耐心等待我(同时我会尽量以描述的方式来尊重你的时间)。
我已经有一段时间我正在尝试正确格式化的数据现在描述了一年多一小时内每小时进行的8次测量。由于我必须检索数据的方式,我现在的电子表格以表格格式列出数据,8个变量名称列为重复行,每天的每个小时作为单独的列,如下所示: / p>
var1[0] var1[1] var1[2] var1[3] var1[4] var1[5] var1[6] var1[7] var1[8] var1[9] var1[10] var1[11] var1[12] var1[13] var1[14] var1[15] var1[16] var1[17] var1[18] var1[19] var1[20] var1[21] var1[22] var1[23]
var3[0] var2[1] var2[2] var2[3] var2[4] var2[5] var2[6] var2[7] var2[8] var2[9] var2[10] var2[11] var2[12] var2[13] var2[14] var2[15] var2[16] var2[17] var2[18] var2[19] var2[20] var2[21] var2[22] var2[23]
var3[0] var3[1] var3[2] var3[3] var3[4] var3[5] var3[6] var3[7] var3[8] var3[9] var3[10] var3[11] var3[12] var3[13] var3[14] var3[15] var3[16] var3[17] var3[18] var3[19] var3[20] var3[21] var3[22] var3[23]
var4[0] var4[1] var4[2] var4[3] var4[4] var4[5] var4[6] var4[7] var4[8] var4[9] var4[10] var4[11] var4[12] var4[13] var4[14] var4[15] var4[16] var4[17] var4[18] var4[19] var4[20] var4[21] var4[22] var4[23]
var5[0] var5[1] var5[2] var5[3] var5[4] var5[5] var5[6] var5[7] var5[8] var5[9] var5[10] var5[11] var5[12] var5[13] var5[14] var5[15] var5[16] var5[17] var5[18] var5[19] var5[20] var5[21] var5[22] var5[23]
var6[0] var6[1] var6[2] var6[3] var6[4] var6[5] var6[6] var6[7] var6[8] var6[9] var6[10] var6[11] var6[12] var6[13] var6[14] var6[15] var6[16] var6[17] var6[18] var6[19] var6[20] var6[21] var6[22] var6[23]
var7[0] var7[1] var7[2] var7[3] var7[4] var7[5] var7[6] var7[7] var7[8] var7[9] var7[10] var7[11] var7[12] var7[13] var7[14] var7[15] var7[16] var7[17] var7[18] var7[19] var7[20] var7[21] var7[22] var7[23]
var8[0] var8[1] var8[2] var8[3] var8[4] var8[5] var8[6] var8[7] var8[8] var8[9] var8[10] var8[11] var8[12] var8[13] var8[14] var8[15] var8[16] var8[17] var8[18] var8[19] var8[20] var8[21] var8[22] var8[23]
var1[24] var1[25] var1[26] var1[27] var1[28] var1[29] var1[30] var1[31] var1[32] var1[33] var1[34] var1[35] var1[36] var1[37] var1[38] var1[39] var1[40] var1[41] var1[42] var1[43] var1[44] var1[45] var1[46] var1[47]
var2[24] var2[25] var2[26] var2[27] var2[28] var2[29] var2[30] var2[31] var2[32] var2[33] var2[34] var2[35] var2[36] var2[37] var2[38] var2[39] var2[40] var2[41] var2[42] var2[43] var2[44] var2[45] var2[46] var2[47]
var3[24] var3[25] var3[26] var3[27] var3[28] var3[29] var3[30] var3[31] var3[32] var3[33] var3[34] var3[35] var3[36] var3[37] var3[38] var3[39] var3[40] var3[41] var3[42] var3[43] var3[44] var3[45] var3[46] var3[47]
var4[24] var4[25] var4[26] var4[27] var4[28] var4[29] var4[30] var4[31] var4[32] var4[33] var4[34] var4[35] var4[36] var4[37] var4[38] var4[39] var4[40] var4[41] var4[42] var4[43] var4[44] var4[45] var4[46] var4[47]
var5[24] var5[25] var5[26] var5[27] var5[28] var5[29] var5[30] var5[31] var5[32] var5[33] var5[34] var5[35] var5[36] var5[37] var5[38] var5[39] var5[40] var5[41] var5[42] var5[43] var5[44] var5[45] var5[46] var5[47]
var6[24] var6[25] var6[26] var6[27] var6[28] var6[29] var6[30] var6[31] var6[32] var6[33] var6[34] var6[35] var6[36] var6[37] var6[38] var6[39] var6[40] var6[41] var6[42] var6[43] var6[44] var6[45] var6[46] var6[47]
var7[24] var7[25] var7[26] var7[27] var7[28] var7[29] var7[30] var7[31] var7[32] var7[33] var7[34] var7[35] var7[36] var7[37] var7[38] var7[39] var7[40] var7[41] var7[42] var7[43] var7[44] var7[45] var7[46] var7[47]
var8[24] var8[25] var8[26] var8[27] var8[28] var8[29] var8[30] var8[31] var8[32] var8[33] var8[34] var8[35] var8[36] var8[37] var8[38] var8[39] var8[40] var8[41] var8[42] var8[43] var8[44] var8[45] var8[46] var8[47]
最初这些数据还有很多,但我已经将其剥离了,以便深入了解我一直存在的问题。 (在上面的例子中,我试图暗示的是在每个小时(t1,t2,t3等)记录变量(var1,var2,var3等)。
我的目标是重新格式化它,使其类似于:
var1[0] var2[0] var3[0] var4[0] var5[0] var6[0] var7[0] var8[0]
var1[1] var2[1] var3[1] var4[1] var5[1] var6[1] var7[1] var8[1]
var1[2] var2[2] var3[2] var4[2] var5[2] var6[2] var7[2] var7[2]
var1[3] var2[3] var3[3] var4[3] var5[3] var6[3] var7[3] var7[3]
. . . . . . . .
. . . . . . . .
. . . . . . . .
[all the way to 9216, which is the number of hours in 384 days]
到目前为止,我已尝试在Excel中使用它,但无法找到实现此目的的方法。我也考虑过像以前一样编写C ++脚本,但我觉得可能有一种更简单的方法。我最近的努力转向R,因为我一直在努力学习它,我听说它非常适合这种数据操作。使用R,我试图按照我发现的一个例子,让我将数据重新创建为不同长度的矩阵(找到here),但这导致了奇怪的错误数据。 (我确定我可能误用了这种方法)。我还研究了here讨论的解决方案,但我无法修改代码以适应我的情况。也许我忽视了一些简单的事情?
有没有人有任何建议?正如我所说,此时我正在尝试在R中执行此操作,但我愿意接受Excel,C或python中的建议。 (我绝对愿意接受其他语言的建议,但这可能需要更详尽的解释:))
谢谢!
[编辑:]
以上数据样本是描述性的。下面是数据的实际前25行的样子;我所做的唯一改变是用保密原因替换变量名称:
Metric,Year,Month,Day,DOW,12am,1am,2am,3am,4am,5am,6am,7am,8am,9am,10am,11am,12pm,1pm,2pm,3pm,4pm,5pm,6pm,7pm,8pm,9pm,10pm,11pm
varA,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,9,22,10,18,24,26,11,21,24,10,0,0,0,0,0
varB,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,10,13,18,28,26,25,25,21,23,13,0,0,0,0,0
varC,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,0,1,7,9,5,1,4,4,1,7,1,0,0,0,0
varD,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,9,23,17,27,29,27,15,25,25,17,1,0,0,0,0
varE,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,44,32,33,65,37,42,62,75,71,50,0,0,0,0,0
varF,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,89,82,83,94,37,77,100,100,90,60,0,0,0,0,0
varG,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,100,100,100,100,95,100,100,100,100,100,0,0,0,0,0
varH,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,9,10,92,12,101,34,14,64,29,86,0,0,0,0,0
varA,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,5,12,23,20,22,24,9,19,15,12,13,9,0,0,0
varB,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,6,14,21,27,26,23,19,22,16,16,16,12,0,0,0
varC,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,2,5,4,10,6,10,2,7,7,4,5,5,0,0,0
varD,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,7,18,27,30,28,34,12,26,22,16,18,14,0,0,0
varE,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,0,50,20,15,67,33,71,47,36,64,58,67,0,0,0
varF,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,60,70,45,70,90,67,100,100,79,91,92,89,0,0,0
varG,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,100,100,100,100,100,94,100,100,100,91,100,100,0,0,0
varH,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,20,12,31,20,29,16,12,12,16,16,34,41,0,0,0
varA,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,9,14,18,25,16,20,22,11,23,13,9,4,0,0,0
varB,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,20,23,17,28,14,18,30,17,27,17,17,6,0,0,0
varC,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,4,8,2,3,2,6,7,2,4,1,2,1,0,0,0
varD,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,13,22,20,29,18,26,29,13,27,14,11,5,0,0,0
varE,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,83,90,43,30,29,17,32,60,71,54,89,100,0,0,0
varF,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,100,100,86,65,43,56,74,90,90,73,100,100,0,0,0
varG,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,100,100,100,100,100,100,100,100,100,100,100,100,0,0,0
varH,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,14,23,17,30,16,14,12,8,9,13,14,6,0,0,0
如您所见,在完整数据集中,开头有五个对应于变量名称的附加列,以及日期信息。
答案 0 :(得分:1)
假设您的数据位于矩阵M
中,这应该有效:
output <- NULL
last.count <- 9216/8 - 1
for (i in 0:last.count) {
output <- rbind(output, t(M[8*i + 1:8,]))
}
ps:rbind
可能很慢(取决于数据大小),在这种情况下,您可以预先分配output
矩阵
答案 1 :(得分:1)
您可以使用Hadley's reshape2包来轻松实现这一目标! 首先让我们制作一些数据,因为你没有给我们任何数据。以后使用this post作为指南。
foo<- matrix(rnorm(8*9216),nrow=8) #matrix of 8 rows
#(8 variables and 9216 - 384 x 24 columns
rownames(foo)<-paste0("V",1:nrow(foo)) #giving rownames,
#you can use "var" here if you want
foo<-data.frame(foo) #making it a data.frame
names(foo)[1:9216]<-paste0("t",0:(ncol(foo)-1)) #time points,
#starting at 0, t0,t1,...t9215
foo <-data.frame(id=rownames(foo),foo) #making sure id column is first
#load the reshape2 library
library(reshape2)
foo.wide <- recast(foo,id ~ variable) #we use the variable id as the id column,
#play with melt and cast to understand what's going on here
#do ?melt, ? cast and look at the examples
#foo.wide is a list with data and labels.
#code below to transform the list in foo.wide to a data.frame
foo.wide.df <-foo.wide$data
names(foo.wide.df)<-unlist(foo.wide$labels[[2]])
row.names(foo.wide.df)<-unlist(foo.wide$labels[[1]])
希望这有帮助
更新:刚看到你发布了示例数据。
您可以使用额外的5列id列
使用以下代码recast
foo.wide.df <-recast(foo, id ~ variable, id.var=1:5)