很难复制,但可以说:
我有一个数据框,其中有107列关于气象站的每月风速(1961年以来的每月数据)。我想针对时间序列中的Breakpoins标准化每一列的数据。 例如,如果某列具有1971-04年的第一个BP,则应使用从第一个记录(1961-01)到第一个BP(1971-04)的均值和标准差进行标准化。如果第二个BP是在1989-05年,则平均值和sd必须从第一个BP到第二个BP。然后,我用新获得的数据替换原始数据。
我执行的代码如下:
library(strucchange)
df <- data.frame(date = seq(as.Date('1961-01-01'),length.out = 700, by = 'months' ), A = rnorm(700, 0, 8.5),
B = rnorm(700, 0, 9.5), C = rnorm(700, 0, 12.4), D = rnorm(700, 0, 5.5)) # create a time series
df[c(2,3,4)][340:560,] <- df[c(2,3,4)][340:560,] + rnorm(12, 87.4, 121.4) # insert some breakpoints for the first 4 columns
bp <- breakpoints(df[,5] ~ 1)
bp <- bp$breakpoints
for (a in names(df[,2:ncol(df)])){
print(a)
stat <- df[,c('date',a)]
bp <- breakpoints(stat[,2] ~ 1)
bp <- bp$breakpoints
dates <- stat[bp,] # create a df with the breakpoints
if(nrow(dates==0)){ # condition if a column does not have any BP
stat[,2] <- (stat[,2] - mean(stat[,2], na.rm = T))/sd(stat[,2], na.rm = T)
df[,a] <- stat[,2]
} else { #if there are BP in the data ...
for (b in 1:nrow(dates)){
print(b)
if(b==1){ #calculate the mean and sd from the first row
substr <- stat[stat$date >= min(stat$date) & stat$date < dates$date[b],]
substr[,2] <- (substr[,2] - mean(substr[,2], na.rm = T))/sd(substr[,2], na.rm = T)
df[,a][df$date >= min(df$date) & df$date < dates$date[b]] <- substr[,2]
} else if (b == nrow(dates)){ #calculate the mean and sd till the last
substr <- stat[stat$date >= dates$date[b-1] & stat$date <= max(stat$date),]
substr[,2] <- (substr[,2] - mean(substr[,2], na.rm = T))/sd(substr[,2], na.rm = T)
df[,a][df$date >= dates$date[b-1] & df$date < max(stat$date)] <- substr[,2]
} else if (b > 1) { # if the BP are neither the first or the last one
substr <- stat[stat$date >= dates$date[b-1] & stat$date < dates$date[b],]
substr[,2] <- (substr[,2] - mean(substr[,2], na.rm = T))/sd(substr[,2], na.rm = T)
df[,a][df$date >= dates$date[b-1] & df$date < dates$date[b]] <- substr[,2]
}
}
}
}
但是,当我手动进行验证时,这些值是错误的。有没有人有任何技巧来简化此代码? (并使其正常工作)?谢谢