我在R中开发了一个系统,用于绘制从风力涡轮机获得的大型数据集。我现在将进程移植到Java中。我在两个系统之间得到的结果是不一致的。
如下图所示:
我可以解释(红色)计算线之间的差异,这是由于我使用不同的计算方法。
在R中,数据处理如下,我写了这段代码with a little help并且不知道这里发生了什么(但是,嘿,它有效)。
df <- data.frame(pwr = pwr, spd = spd)
require(mgcv)
mod <- gam(pwr ~ s(spd, bs = "ad", k = 20), data = df, method = "REML")
summary(mod)
x_grid <- with(df, data.frame(spd = seq(min(spd) + 0.0001, maxi, length=100)))
pred <- predict(mod, x_grid, se.fit = TRUE)
x_grid <- within(x_grid, fit <- pred$fit)
lines(fit ~ spd, data = x_grid, col = "red", lwd = thickLineWidth)
在Java(SQL infact)中,我使用bin的方法来计算x轴上每0.5的平均值。使用org.jfree.chart.renderer.xy.XYSplineRenderer
绘制结果数据我不太了解线的呈现方式。
SELECT
ROUND( ROUND( x_data * 2 ) / 2, 1) AS x_axis, # See https://stackoverflow.com/questions/5230647/mysql-rounding-functions
AVG( y_data ) AS y_axis
FROM
table
GROUP BY
x_axis
我对两个图表之间的差异采取了看法:
这些是我想要消除的内容。
因此,为了理解两个图表之间的区别,我有几个问题:
答案 0 :(得分:4)
在R代码中,你是(当我展示这个例子的时候),在功率和速度数据中拟合一个加法模型,其中变量之间的关系由数据本身决定。这些模型涉及使用样条来估计响应函数。特别是在这里,我使用了自适应平滑器k = 20
更平滑拟合的复杂性。平滑器越复杂,拟合函数就越
为什么这很重要?那么,根据您的数据,有些时段响应不随速度变量而变化,而响应随速度变化而快速变化的时段也是如此。我们有一个摇摆不定的“余量”用于曲线。对于普通样条曲线,整个函数的摆动(或平滑度)是相同的。通过自适应平滑,我们可以在响应变化/变化最大的函数部分中使用更多的摆动余量,而不在响应不变的部分中不需要任何余量。
下面我注释代码以解释每一步的工作:
## here we create a data frame with the pwr and spd variables
df <- data.frame(pwr = pwr, spd = spd)
## we load the package containing the code to fit the additive model
require(mgcv)
## This is the model itself, saying pwr is modelled as a smooth function of spd
## and the smooth function of spd is generated using an adaptive smoother with
## and "allowance" of 20. This allowance is a starting point and the actual
## smoothness of the curve will be estimated as part of the model fitting,
## here using a REML criterion
mod <- gam(pwr ~ s(spd, bs = "ad", k = 20), data = df, method = "REML")
## This just summarise the model fit
summary(mod)
## In this line we are creating a new spd vector (in a data frame) that contains
## 100 equally spaced spd values over the entire range of the observed spd
x_grid <- with(df, data.frame(spd = seq(min(spd) + 0.0001, maxi, length=100)))
## we will use those data to get predictions of the response pwr at each
## of the 100 values of spd we just created
## I did this so we had enough data to plot a nice smooth curve, but without
## having to predict for all the observed values of spd
pred <- predict(mod, x_grid, se.fit = TRUE)
## This line stores the 100 predicted values in the prediction data object
x_grid <- within(x_grid, fit <- pred$fit)
## This line draws the fitted smooth on to a plot of the data
## this assumes there is already a plot on the active device.
lines(fit ~ spd, data = x_grid, col = "red", lwd = thickLineWidth)
如果您不熟悉添加剂模型和平滑/样条曲线,那么我推荐Ruppert,Wand和Carroll(2003)Semiparametric Regression。剑桥大学出版社。