我正在解决一个问题。尝试在R中重现一个公式。我刚刚在Mathematica中完成了这个代码,但现在我想在R中为我的学生重现。这是计算一年中“平均日”的智能方法,称为代表日。该方法描述为here。
我的部分数据是:
date temp Hour DayCount
01/01/17 -2 0 1
01/01/17 -2 1 1
01/01/17 -2 2 1
01/01/17 -3 3 1
01/01/17 -4 4 1
01/01/17 -4 5 1
01/01/17 -5 6 1
01/01/17 -6 7 1
01/01/17 -4 8 1
01/01/17 -2 9 1
01/01/17 -1 10 1
01/01/17 0 11 1
01/01/17 1 12 1
01/01/17 2 13 1
01/01/17 1 14 1
01/01/17 -1 15 1
01/01/17 -2 16 1
01/01/17 -1 17 1
01/01/17 -2 18 1
01/01/17 -3 19 1
01/01/17 -2 20 1
01/01/17 -3 21 1
01/01/17 -2 22 1
01/01/17 -1 23 1
02/01/17 -1 0 2
02/01/17 -1 1 2
02/01/17 -1 2 2
02/01/17 -1 3 2
02/01/17 -1 4 2
02/01/17 -1 5 2
02/01/17 -1 6 2
02/01/17 -1 7 2
02/01/17 -1 8 2
02/01/17 -1 9 2
02/01/17 0 10 2
02/01/17 0 11 2
02/01/17 1 12 2
02/01/17 1 13 2
02/01/17 1 14 2
02/01/17 1 15 2
02/01/17 1 16 2
02/01/17 1 17 2
02/01/17 -1 18 2
02/01/17 -3 19 2
02/01/17 -2 20 2
02/01/17 -2 21 2
02/01/17 -2 22 2
02/01/17 -1 23 2
所以我想重现这个公式:
其中N
是时间段(现在为2)的天数,每cki
和ckj
是第k个小时的第i天的温度。
我所拥有的是对称矩阵,对角线全部为零。
然后我必须总结所有的行
这是我的代码:
data$DayCount <- as.factor(data$DayCount)
datasplit <- split(data, data$DayCount) #Split my data for each day
distance=matrix() #Create an empty matrix
for (k in 1:24) {
for (i in 1:2) {
for (j in 1:2) {
distance[i,j]= ((datasplit[[i]][k,2]-datasplit[[j]][k,2])^2)
sum=sum(distance)
}
}
}
有什么建议吗?我知道你能做到的。请帮帮我!
答案 0 :(得分:0)
首先让我们创建一个数据框对象,这样我们就可以轻松地操作数据了:
df <- read.csv(stringsAsFactors = TRUE, text = 'date, temp, Hour, DayCount
01/01/17, -2, 0 , 1
01/01/17, -2, 1 , 1
01/01/17, -2, 2 , 1
01/01/17, -3, 3 , 1
01/01/17, -4, 4 , 1
01/01/17, -4, 5 , 1
01/01/17, -5, 6 , 1
01/01/17, -6, 7 , 1
01/01/17, -4, 8 , 1
01/01/17, -2, 9 , 1
01/01/17, -1, 10, 1
01/01/17, 0 , 11, 1
01/01/17, 1 , 12, 1
01/01/17, 2 , 13, 1
01/01/17, 1 , 14, 1
01/01/17, -1, 15, 1
01/01/17, -2, 16, 1
01/01/17, -1, 17, 1
01/01/17, -2, 18, 1
01/01/17, -3, 19, 1
01/01/17, -2, 20, 1
01/01/17, -3, 21, 1
01/01/17, -2, 22, 1
01/01/17, -1, 23, 1
02/01/17, -1, 0 , 2
02/01/17, -1, 1 , 2
02/01/17, -1, 2 , 2
02/01/17, -1, 3 , 2
02/01/17, -1, 4 , 2
02/01/17, -1, 5 , 2
02/01/17, -1, 6 , 2
02/01/17, -1, 7 , 2
02/01/17, -1, 8 , 2
02/01/17, -1, 9 , 2
02/01/17, 0 , 10, 2
02/01/17, 0 , 11, 2
02/01/17, 1 , 12, 2
02/01/17, 1 , 13, 2
02/01/17, 1 , 14, 2
02/01/17, 1 , 15, 2
02/01/17, 1 , 16, 2
02/01/17, 1 , 17, 2
02/01/17, -1, 18, 2
02/01/17, -3, 19, 2
02/01/17, -2, 20, 2
02/01/17, -2, 21, 2
02/01/17, -2, 22, 2
02/01/17, -1, 23, 2')
现在让我们尝试按照你的指示,我不是试图以最佳方式实现这一目标,而是尽可能地坚持你原来的想法,所以我会使用几个嵌套循环:
# get the different days
days <- levels(df$date)
# create the A matrix, empty
A <- matrix(nrow = length(days), ncol = length(days))
# iterate
for(i in 1:length(days)) {
for(j in 1:length(days)) {
# get all the temperatures available for each day
ci <- df[df$date == days[i],]$temp
cj <- df[df$date == days[j],]$temp
# update the A matrix
A[i, j] <- sum((ci - cj)^2)
}
}
# finally the last sum
Aj <- unlist(lapply(1:length(days), function(i) sum(A[i, ])))
结果是:
> A
[,1] [,2]
[1,] 0 97
[2,] 97 0
> Aj
[1] 97 97
这应该适用于任何天数以及每天所需的温度测量(不一定是24天)。