我有一个数据框df
structure(list(x = c(49, 50, 51, 52, 53, 54, 55, 56, 1, 2, 3,
4, 5, 14, 15, 16, 17, 163, 164, 165, 153, 154, 72, 38, 39, 40,
23, 13, 14, 15, 5, 6, 74, 75, 76, 77, 78, 79, 80, 81, 82, 127,
128, 129, 130, 131, 132, 71, 72, 73, 74, 75, 76, 2, 3, 4, 5,
6, 99, 100, 101, 10, 11, 3, 30, 50, 51, 52, 53, 54, 56, 64, 66,
67, 68, 69, 34, 35, 37, 39, 2, 46, 47, 17, 18, 99, 100, 102,
103, 84, 85, 86, 87, 88, 67, 70, 72), y = c(2268.14043972082,
2147.62290922552, 2269.1387550775, 2247.31983098201, 1903.39138268307,
2174.78291538358, 2359.51909126411, 2488.39004804939, 212.851575751527,
461.398994384333, 567.150629704352, 781.775113821961, 918.303706148872,
1107.37695799186, 1160.80594193377, 1412.61328924168, 1689.48879626486,
685.154353165934, 574.088067465695, 650.30821636616, 494.185166497016,
436.312162090908, 641.231373456365, 494.374217984441, 201.745910386788,
486.030122926459, 483.045965372879, 265.693628564011, 285.116345260642,
291.023782284086, 229.606221692753, 230.952761338012, 1089.06303295676,
1255.88808925333, 1087.75402177912, 1068.248897182, 1212.17254891642,
884.222588171535, 938.887718005513, 863.582247020735, 1065.91969416523,
790.570635510725, 834.500908313203, 710.755041345197, 814.002362551197,
726.814950022846, 828.559687148314, 611.564698476112, 603.238720579422,
524.322001078981, 565.296378873638, 532.431853589369, 597.174114277044,
260.737164468854, 306.72700499362, 283.410379620422, 366.813913489692,
387.570173754128, 606.075737104722, 686.408686154056, 705.914347674276,
388.602676983443, 477.858510450125, 128.198042456082, 535.519377609133,
1893.38468471169, 1819.83262739703, 1827.31409981102, 1640.5816780664,
1689.0365549922, 2112.67917439342, 1028.8780498564, 1098.54431357711,
1265.26965941035, 1129.58344809909, 820.922447928053, 749.343583476846,
779.678206156474, 646.575242339517, 733.953282899613, 461.156280127354,
1184.81825619942, 1281.2920902365, 906.813018662913, 798.186995701282,
831.365377249207, 764.519073183124, 672.076289062505, 669.879217186302,
1265.48484068702, 1193.29000986667, 1156.81486114406, 1199.7373066445,
1116.24029749935, 1341.47673353751, 1401.44881976186, 1640.27575962036
), ID = 1:97), .Names = c("x", "y", "ID"), row.names = c(NA,
-97L), class = "data.frame")
我现在想要根据交叉验证计算一个非线性模型的模型效率,留出一个ID。我实现了这行代码。
library(stats)
library (hydroGOF)
id <- unique(df$ID)
for (i in id){
fit1 <- try(nls(y~A*x^3+B*x^2+C*x+D, data = df[df$ID != i,], start = list(A=0.02, B=-0.6, C= 50, D=200)), silent=TRUE)
Out <- if (inherits(fit1, "nls")) NSE(sim = predict(fit1, newdatadata=df[df$ID==i,]), obs = df$y, na.rm=T)
}
但是,我有此错误消息:
Error in valindex.default(sim, obs) :
Invalid argument: 'length(sim) != length(obs)' !! (96!=97) !!
有人可以帮我解决这个问题吗?
答案 0 :(得分:1)
您在上面的代码中有一些小错误和一个很大的逻辑错误,我将在下面解释:
首先,代码应该是这样的:
library(stats)
library (hydroGOF)
id <- unique(df$ID)
for (i in id){
fit1 <- try(nls(y~A*x^3+B*x^2+C*x+D, data = df[df$ID != i,], start = list(A=0.02, B=-0.6, C= 50, D=200)), silent=TRUE)
Out <- if (inherits(fit1, "nls")) NSE(sim = predict(fit1, newdata=df[df$ID==i,]), obs = df$y[df$ID==i], na.rm=T)
}
在您的问题中,在NSE
函数中,您将参数设置为newdatadata=df[df$ID==i,])
而不是newdata=df[df$ID==i,])
,即其中有一个额外的data
会导致运行时出现问题功能(你拼错了参数:))。此外,obs
参数的长度不正确,因为将使用整个y
列,但您只需要df$y[df$ID==i]
长度为1(以便与现在的预测相匹配)长度也是一样的。)
现在纠正后,上面的代码将会运行。
但是,运行后,您会看到它会产生警告,指出NSE
时您无法使用'sum((obs - mean(obs))^2)=0' => it is not possible to compute 'NSE'
。在您的情况下,由于您只有一个障碍,因此计算'sum((obs - mean(obs))^2)=0'
将始终为零。
因此,您不能将此技术用于NSE
,因为它会因定义而失败(因为您尝试在单个观察中计算NSE)。您应该收集所有留一法的预测,将它们存储在变量中,然后对NSE
的变量使用df$y
。这将有效。
我的意思是以下(完成了留一个cv):
Out <- c()
id <- unique(df$ID)
for (i in id){
fit1 <- try(nls(y~A*x^3+B*x^2+C*x+D, data = df[df$ID != i,], start = list(A=0.02, B=-0.6, C= 50, D=200)), silent=TRUE)
Out[i] <- if (inherits(fit1, "nls")) sim = predict(fit1, newdata=df[df$ID==i,])
}
现在可以使用:
> NSE(Out, df$y)
[1] 0.3440862