Question

我刚刚从我们的一个数据记录器下载了大量温度数据。数据框给出了87个温度传感器的平均每小时温度观测值1691小时（因此这里有大量数据）。这看起来像这样

D1_A     D1_B     D1_C
13.43    14.39    12.33
12.62    13.53    11.56
11.67    12.56    10.36
10.83    11.62    9.47

我想将此数据集重新整形为一个如下所示的矩阵：

#create a blank matrix 5 columns 131898 rows 
matrix1<-matrix(nrow=131898, ncol=5)
colnames(matrix1)<- c("year", "ID", "Soil_Layer", "Hour", "Temperature")

其中：

year is always "2012"
ID corresponds to the header ID (e.g. D1)
Soil_Layer corresponds to the second bit of the header (e.g. A, B, or C)
Hour= 1:1691 for each sensor 
and Temperature= the observed values in the original dataframe.

这可以用r中的reshape包来完成吗？这是否需要作为循环完成？有关如何处理此数据集的任何输入都将非常有用。干杯!

Answer 1

我认为这样做符合您的要求......您可以利用包colsplit()中的melt()和reshape2功能。目前尚不清楚您在哪里为数据识别Hour，因此我假设它是从原始数据集中排序的。如果不是这样，请更新您的问题：

library(reshape2)
#read in your data
x <- read.table(text = "

    D1_A    D1_B  D1_C
    13.43 14.39   12.33
    12.62 13.53   11.56
    11.67 12.56   10.36
    10.83 11.62   9.47
    9.98  10.77   9.04
    9.24  10.06   8.65
    8.89  9.55    8.78
    9.01  9.39    9.88
", header = TRUE)

#add hour index, if data isn't ordered, replace this with whatever 
#tells you which hour goes where
x$hour <- 1:nrow(x)
#Melt into long format
x.m <- melt(x, id.vars = "hour")
#Split into two columns
x.m[, c("ID", "Soil_Layer")] <- colsplit(x.m$variable, "_", c("ID", "Soil_Layer"))
#Add the year
x.m$year <- 2012

#Return the first 6 rows
head(x.m[, c("year", "ID", "Soil_Layer", "hour", "value")])
#----
  year ID Soil_Layer hour value
1 2012 D1          A    1 13.43
2 2012 D1          A    2 12.62
3 2012 D1          A    3 11.67
4 2012 D1          A    4 10.83
5 2012 D1          A    5  9.98
6 2012 D1          A    6  9.24

熔化并重新投入r中的新数据帧

1 个答案: