我刚刚从我们的一个数据记录器下载了大量温度数据。数据框给出了87个温度传感器的平均每小时温度观测值1691小时(因此这里有大量数据)。这看起来像这样
D1_A D1_B D1_C 13.43 14.39 12.33 12.62 13.53 11.56 11.67 12.56 10.36 10.83 11.62 9.47
我想将此数据集重新整形为一个如下所示的矩阵:
#create a blank matrix 5 columns 131898 rows
matrix1<-matrix(nrow=131898, ncol=5)
colnames(matrix1)<- c("year", "ID", "Soil_Layer", "Hour", "Temperature")
其中:
year is always "2012"
ID corresponds to the header ID (e.g. D1)
Soil_Layer corresponds to the second bit of the header (e.g. A, B, or C)
Hour= 1:1691 for each sensor
and Temperature= the observed values in the original dataframe.
这可以用r中的reshape包来完成吗?这是否需要作为循环完成?有关如何处理此数据集的任何输入都将非常有用。干杯!
答案 0 :(得分:2)
我认为这样做符合您的要求......您可以利用包colsplit()
中的melt()
和reshape2
功能。目前尚不清楚您在哪里为数据识别Hour
,因此我假设它是从原始数据集中排序的。如果不是这样,请更新您的问题:
library(reshape2)
#read in your data
x <- read.table(text = "
D1_A D1_B D1_C
13.43 14.39 12.33
12.62 13.53 11.56
11.67 12.56 10.36
10.83 11.62 9.47
9.98 10.77 9.04
9.24 10.06 8.65
8.89 9.55 8.78
9.01 9.39 9.88
", header = TRUE)
#add hour index, if data isn't ordered, replace this with whatever
#tells you which hour goes where
x$hour <- 1:nrow(x)
#Melt into long format
x.m <- melt(x, id.vars = "hour")
#Split into two columns
x.m[, c("ID", "Soil_Layer")] <- colsplit(x.m$variable, "_", c("ID", "Soil_Layer"))
#Add the year
x.m$year <- 2012
#Return the first 6 rows
head(x.m[, c("year", "ID", "Soil_Layer", "hour", "value")])
#----
year ID Soil_Layer hour value
1 2012 D1 A 1 13.43
2 2012 D1 A 2 12.62
3 2012 D1 A 3 11.67
4 2012 D1 A 4 10.83
5 2012 D1 A 5 9.98
6 2012 D1 A 6 9.24