示例:https://i.stack.imgur.com/G1T4f.png (此图片是在Google上随机找到的。)
我想知道是否有现有的回归算法可以将多行拟合到数据中,如图所示,即使数据点混合在一起(未标记)?我认为可以通过反复增加线的数量并将点聚类到线来实现。
谢谢。
答案 0 :(得分:3)
您正在寻找的模型称为RANSAC
,这是在嘈杂的点数据中查找多条线的好方法。标准RANSAC
的用法是选择最佳假设(在这种情况下为行),但您也可以根据数据轻松选择最佳2或4行。
这是skimage
中的一个示例(它也存在于sklearn
中):
import numpy as np
from matplotlib import pyplot as plt
from skimage.measure import LineModelND, ransac
np.random.seed(seed=1)
# generate coordinates of line
x = np.arange(-200, 200)
y = 0.2 * x + 20
data = np.column_stack([x, y])
# add gaussian noise to coordinates
noise = np.random.normal(size=data.shape)
data += 0.5 * noise
data[::2] += 5 * noise[::2]
data[::4] += 20 * noise[::4]
# add faulty data
faulty = np.array(30 * [(180., -100)])
faulty += 10 * np.random.normal(size=faulty.shape)
data[:faulty.shape[0]] = faulty
# fit line using all data
model = LineModelND()
model.estimate(data)
# robustly fit line only using inlier data with RANSAC algorithm
model_robust, inliers = ransac(data, LineModelND, min_samples=2,
residual_threshold=1, max_trials=1000)
outliers = inliers == False
# generate coordinates of estimated models
line_x = np.arange(-250, 250)
line_y = model.predict_y(line_x)
line_y_robust = model_robust.predict_y(line_x)
fig, ax = plt.subplots()
ax.plot(data[inliers, 0], data[inliers, 1], '.b', alpha=0.6,
label='Inlier data')
ax.plot(data[outliers, 0], data[outliers, 1], '.r', alpha=0.6,
label='Outlier data')
ax.plot(line_x, line_y, '-k', label='Line model from all data')
ax.plot(line_x, line_y_robust, '-b', label='Robust line model')
ax.legend(loc='lower left')
plt.show()
这是针对您的特定问题而开发的:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
MIN_SAMPLES = 3
x = np.linspace(0, 2, 100)
xs, ys = [], []
# generate points for thee lines described by a and b,
# we also add some noise:
for a, b in [(1.0, 2), (0.5, 1), (1.2, -1)]:
xs.extend(x)
ys.extend(a * x + b + .1 * np.random.randn(len(x)))
xs = np.array(xs)
ys = np.array(ys)
plt.plot(xs, ys, "r.")
colors = "rgbky"
idx = 0
while len(xs) > MIN_SAMPLES:
# build design matrix for linear regressor
X = np.ones((len(xs), 2))
X[:, 1] = xs
ransac = linear_model.RANSACRegressor(
residual_threshold=.3, min_samples=MIN_SAMPLES
)
res = ransac.fit(X, ys)
# vector of boolean values, describes which points belong
# to the fitted line:
inlier_mask = ransac.inlier_mask_
# plot point cloud:
xinlier = xs[inlier_mask]
yinlier = ys[inlier_mask]
# circle through colors:
color = colors[idx % len(colors)]
idx += 1
plt.plot(xinlier, yinlier, color + "*")
# only keep the outliers:
xs = xs[~inlier_mask]
ys = ys[~inlier_mask]
plt.show()