我正在使用AIC / BIC / MDL类型的标准进行模型选择,这种标准可以奖励低错误的模型,但也会惩罚高复杂度的模型(我们正在寻找对这些数据最简单但最有说服力的解释可以这么说,la Occam's razor)。
从视觉上你可以很容易地看到肘部形状,你可以在该区域的某处选择一个参数值。 问题是我正在为大量实验做这件事,我需要一种方法来找到这个值而不需要干预。
curve = [8.4663 8.3457 5.4507 5.3275 4.8305 4.7895 4.6889 4.6833 4.6819 4.6542 4.6501 4.6287 4.6162 4.585 4.5535 4.5134 4.474 4.4089 4.3797 4.3494 4.3268 4.3218 4.3206 4.3206 4.3203 4.2975 4.2864 4.2821 4.2544 4.2288 4.2281 4.2265 4.2226 4.2206 4.2146 4.2144 4.2114 4.1923 4.19 4.1894 4.1785 4.178 4.1694 4.1694 4.1694 4.1556 4.1498 4.1498 4.1357 4.1222 4.1222 4.1217 4.1192 4.1178 4.1139 4.1135 4.1125 4.1035 4.1025 4.1023 4.0971 4.0969 4.0915 4.0915 4.0914 4.0836 4.0804 4.0803 4.0722 4.065 4.065 4.0649 4.0644 4.0637 4.0616 4.0616 4.061 4.0572 4.0563 4.056 4.0545 4.0545 4.0522 4.0519 4.0514 4.0484 4.0467 4.0463 4.0422 4.0392 4.0388 4.0385 4.0385 4.0383 4.038 4.0379 4.0375 4.0364 4.0353 4.0344];
plot(1:100, curve)
的点:{/ p>
答案 0 :(得分:39)
curve = [8.4663 8.3457 5.4507 5.3275 4.8305 4.7895 4.6889 4.6833 4.6819 4.6542 4.6501 4.6287 4.6162 4.585 4.5535 4.5134 4.474 4.4089 4.3797 4.3494 4.3268 4.3218 4.3206 4.3206 4.3203 4.2975 4.2864 4.2821 4.2544 4.2288 4.2281 4.2265 4.2226 4.2206 4.2146 4.2144 4.2114 4.1923 4.19 4.1894 4.1785 4.178 4.1694 4.1694 4.1694 4.1556 4.1498 4.1498 4.1357 4.1222 4.1222 4.1217 4.1192 4.1178 4.1139 4.1135 4.1125 4.1035 4.1025 4.1023 4.0971 4.0969 4.0915 4.0915 4.0914 4.0836 4.0804 4.0803 4.0722 4.065 4.065 4.0649 4.0644 4.0637 4.0616 4.0616 4.061 4.0572 4.0563 4.056 4.0545 4.0545 4.0522 4.0519 4.0514 4.0484 4.0467 4.0463 4.0422 4.0392 4.0388 4.0385 4.0385 4.0383 4.038 4.0379 4.0375 4.0364 4.0353 4.0344];
%# get coordinates of all the points
nPoints = length(curve);
allCoord = [1:nPoints;curve]'; %'# SO formatting
%# pull out first point
firstPoint = allCoord(1,:);
%# get vector between first and last point - this is the line
lineVec = allCoord(end,:) - firstPoint;
%# normalize the line vector
lineVecN = lineVec / sqrt(sum(lineVec.^2));
%# find the distance from each point to the line:
%# vector between all points and first point
vecFromFirst = bsxfun(@minus, allCoord, firstPoint);
%# To calculate the distance to the line, we split vecFromFirst into two
%# components, one that is parallel to the line and one that is perpendicular
%# Then, we take the norm of the part that is perpendicular to the line and
%# get the distance.
%# We find the vector parallel to the line by projecting vecFromFirst onto
%# the line. The perpendicular vector is vecFromFirst - vecFromFirstParallel
%# We project vecFromFirst by taking the scalar product of the vector with
%# the unit vector that points in the direction of the line (this gives us
%# the length of the projection of vecFromFirst onto the line). If we
%# multiply the scalar product by the unit vector, we have vecFromFirstParallel
scalarProduct = dot(vecFromFirst, repmat(lineVecN,nPoints,1), 2);
vecFromFirstParallel = scalarProduct * lineVecN;
vecToLine = vecFromFirst - vecFromFirstParallel;
%# distance to line is the norm of vecToLine
distToLine = sqrt(sum(vecToLine.^2,2));
%# plot the distance to the line
figure('Name','distance from curve to line'), plot(distToLine)
%# now all you need is to find the maximum
[maxDist,idxOfBestPoint] = max(distToLine);
%# plot
figure, plot(curve)
hold on
plot(allCoord(idxOfBestPoint,1), allCoord(idxOfBestPoint,2), 'or')
答案 1 :(得分:17)
如果有人需要上面Jonas发布的 Matlab 代码的 Python 版本。
import numpy as np
curve = [8.4663, 8.3457, 5.4507, 5.3275, 4.8305, 4.7895, 4.6889, 4.6833, 4.6819, 4.6542, 4.6501, 4.6287, 4.6162, 4.585, 4.5535, 4.5134, 4.474, 4.4089, 4.3797, 4.3494, 4.3268, 4.3218, 4.3206, 4.3206, 4.3203, 4.2975, 4.2864, 4.2821, 4.2544, 4.2288, 4.2281, 4.2265, 4.2226, 4.2206, 4.2146, 4.2144, 4.2114, 4.1923, 4.19, 4.1894, 4.1785, 4.178, 4.1694, 4.1694, 4.1694, 4.1556, 4.1498, 4.1498, 4.1357, 4.1222, 4.1222, 4.1217, 4.1192, 4.1178, 4.1139, 4.1135, 4.1125, 4.1035, 4.1025, 4.1023, 4.0971, 4.0969, 4.0915, 4.0915, 4.0914, 4.0836, 4.0804, 4.0803, 4.0722, 4.065, 4.065, 4.0649, 4.0644, 4.0637, 4.0616, 4.0616, 4.061, 4.0572, 4.0563, 4.056, 4.0545, 4.0545, 4.0522, 4.0519, 4.0514, 4.0484, 4.0467, 4.0463, 4.0422, 4.0392, 4.0388, 4.0385, 4.0385, 4.0383, 4.038, 4.0379, 4.0375, 4.0364, 4.0353, 4.0344]
nPoints = len(curve)
allCoord = np.vstack((range(nPoints), curve)).T
np.array([range(nPoints), curve])
firstPoint = allCoord[0]
lineVec = allCoord[-1] - allCoord[0]
lineVecNorm = lineVec / np.sqrt(np.sum(lineVec**2))
vecFromFirst = allCoord - firstPoint
scalarProduct = np.sum(vecFromFirst * np.matlib.repmat(lineVecNorm, nPoints, 1), axis=1)
vecFromFirstParallel = np.outer(scalarProduct, lineVecNorm)
vecToLine = vecFromFirst - vecFromFirstParallel
distToLine = np.sqrt(np.sum(vecToLine ** 2, axis=1))
idxOfBestPoint = np.argmax(distToLine)
答案 2 :(得分:8)
答案 3 :(得分:7)
答案 4 :(得分:5)
所以解决这个问题的一种方法是两条适合你肘部 L 的两条线。但由于曲线的一部分只有几个点(正如我在评论中所提到的),除非你检测出哪些点间隔并在它们之间进行插值以制造更均匀的系列并且,否则线条拟合会受到影响。然后使用RANSAC找到两条线以适应 L - 有点复杂但并非不可能。
所以这是一个更简单的解决方案 - 由于MATLAB的缩放(显然),你提出的图表看起来就像它们一样。所以我所做的就是使用比例信息最小化从“原点”到你的点的距离。
%% Order
curve = [8.4663 8.3457 5.4507 5.3275 4.8305 4.7895 4.6889 4.6833 4.6819 4.6542 4.6501 4.6287 4.6162 4.585 4.5535 4.5134 4.474 4.4089 4.3797 4.3494 4.3268 4.3218 4.3206 4.3206 4.3203 4.2975 4.2864 4.2821 4.2544 4.2288 4.2281 4.2265 4.2226 4.2206 4.2146 4.2144 4.2114 4.1923 4.19 4.1894 4.1785 4.178 4.1694 4.1694 4.1694 4.1556 4.1498 4.1498 4.1357 4.1222 4.1222 4.1217 4.1192 4.1178 4.1139 4.1135 4.1125 4.1035 4.1025 4.1023 4.0971 4.0969 4.0915 4.0915 4.0914 4.0836 4.0804 4.0803 4.0722 4.065 4.065 4.0649 4.0644 4.0637 4.0616 4.0616 4.061 4.0572 4.0563 4.056 4.0545 4.0545 4.0522 4.0519 4.0514 4.0484 4.0467 4.0463 4.0422 4.0392 4.0388 4.0385 4.0385 4.0383 4.038 4.0379 4.0375 4.0364 4.0353 4.0344];
x_axis = 1:numel(curve);
points = [x_axis ; curve ]'; %' - SO formatting
%% Get the scaling info
f = figure(1);
ticks = get(get(f,'CurrentAxes'),'YTickLabel');
ticks = str2num(ticks);
aspect = get(get(f,'CurrentAxes'),'DataAspectRatio');
aspect = [aspect(2) aspect(1)];
%% Get the "origin"
O = [x_axis(1) ticks(1)];
%% Scale the data - now the scaled values look like MATLAB''s idea of
% what a good plot should look like
scaled_O = O.*aspect;
scaled_points = bsxfun(@times,points,aspect);
%% Find the closest point
del = sum((bsxfun(@minus,scaled_points,scaled_O).^2),2);
[val ind] = min(del);
best_ROC = [ind curve(ind)];
%% Display
hold on;
曲线, ,您必须将原点更改为[x_axis(1) ticks(end)]
答案 5 :(得分:5)
中实施的解决方案elbow_finder <- function(x_values, y_values) {
# Max values to create line
max_x_x <- max(x_values)
max_x_y <- y_values[which.max(x_values)]
max_y_y <- max(y_values)
max_y_x <- x_values[which.max(y_values)]
max_df <- data.frame(x = c(max_y_x, max_x_x), y = c(max_y_y, max_x_y))
# Creating straight line between the max values
fit <- lm(max_df$y ~ max_df$x)
# Distance from point to line
distances <- c()
for(i in 1:length(x_values)) {
distances <- c(distances, abs(coef(fit)[2]*x_values[i] - y_values[i] + coef(fit)[1]) / sqrt(coef(fit)[2]^2 + 1^2))
# Max distance point
x_max_dist <- x_values[which.max(distances)]
y_max_dist <- y_values[which.max(distances)]
return(c(x_max_dist, y_max_dist))
答案 6 :(得分:3)
答案 7 :(得分:2)
我一直在研究膝关节/肘关节检测。绝不是我是专家。 一些可能与此问题相关的方法。
DFDT代表动态第一导数阈值。它计算一阶导数并使用阈值算法来检测膝/肘点。 DSDT类似,但使用二阶导数,我的评价表明他们有相似的表现。
S方法是L方法的扩展。 L方法在曲线上插入两条直线,两条线之间的截距是拐点/肘点。通过循环整个点,拟合线并评估MSE(均方误差)来找到最佳拟合。 S方法适合3条直线,这提高了精度,但也需要更多的计算。
答案 8 :(得分:1)
elbow_finder <- function(x_values, y_values) {
i_max <- length(x_values) - 1
# First and second derived
first_derived <- list()
second_derived <- list()
# First derived
for(i in 2:i_max){
slope1 <- (y_values[i+1] - y_values[i]) / (x_values[i+1] - x_values[i])
slope2 <- (y_values[i] - y_values[i-1]) / (x_values[i] - x_values[i-1])
slope_avg <- (slope1 + slope2) / 2
first_derived[[i]] <- slope_avg
first_derived[[1]] <- NA
first_derived[[i_max+1]] <- NA
first_derived <- unlist(first_derived)
# Second derived
for(i in 3:i_max-1){
d1 <- (first_derived[i+1] - first_derived[i]) / (x_values[i+1] - x_values[i])
d2 <- (first_derived[i] - first_derived[i-1]) / (x_values[i] - x_values[i-1])
d_avg <- (d1 + d2) / 2
second_derived[[i]] <- d_avg
second_derived[[1]] <- NA
second_derived[[2]] <- NA
second_derived[[i_max]] <- NA
second_derived[[i_max+1]] <- NA
second_derived <- unlist(second_derived)
return(list(d1 = first_derived, d2 = second_derived))
答案 9 :(得分:0)
如果您愿意,我已将其翻译为R(作为我自己的练习)(原谅我未优化的编码风格)。 *将其应用于在k均值上找到最佳聚类数-效果很好。
elbow.point = function(x){
elbow.curve = c(x)
nPoints = length(elbow.curve);
allCoord = cbind(c(1:nPoints),c(elbow.curve))
# pull out first point
firstPoint = allCoord[1,]
# get vector between first and last point - this is the line
lineVec = allCoord[nPoints,] - firstPoint;
# normalize the line vector
lineVecN = lineVec / sqrt(sum(lineVec^2));
# find the distance from each point to the line:
# vector between all points and first point
vecFromFirst = lapply(c(1:nPoints), function(x){
allCoord[x,] - firstPoint
vecFromFirst = do.call(rbind, vecFromFirst)
scalarProduct = matrix(nrow = nPoints, ncol = 2)
scalarProduct[,1] = vecFromFirst[,1] * rep.row(lineVecN,nPoints)[,1]
scalarProduct[,2] = vecFromFirst[,2] * rep.row(lineVecN,nPoints)[,2]
scalarProduct = as.matrix(rowSums(scalarProduct))
vecFromFirstParallel = matrix(nrow = nPoints, ncol = 2)
vecFromFirstParallel[,1] = scalarProduct * lineVecN[1]
vecFromFirstParallel[,2] = scalarProduct * lineVecN[2]
vecToLine = lapply(c(1:nPoints), function(x){
vecFromFirst[x,] - vecFromFirstParallel[x,]
vecToLine = do.call(rbind, vecToLine)
# distance to line is the norm of vecToLine
distToLine = as.matrix(sqrt(rowSums(vecToLine^2)))
答案 10 :(得分:0)
在选择模型时不要忽略k倍交叉验证,这是AIC / BIC的绝佳替代品。此外,请考虑正在建模的基本情况,并允许您使用领域知识来帮助选择模型。