从最佳拟合线找到斜率趋势

时间:2017-04-21 21:20:04

标签: python python-2.7 matplotlib plot linear-regression

我试图弄清楚如何从具有点的最佳拟合线确定斜率趋势。基本上,一旦我有斜率的趋势,我想在同一图中用该趋势绘制多个其他线。例如:enter image description here

这个情节基本上是我想做的,但我不知道该怎么做。正如您所看到的,它有几条最佳拟合线,其中的点具有斜率并且在x = 6处相交。在这些线之后,它有几条线基于来自其他斜坡的趋势。我假设使用这段代码我可以做类似的事情,但我不确定如何操纵代码来做我想做的事。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# simulate some artificial data
# =====================================
df = pd.DataFrame( { 'Age' : np.random.rand(25) * 160 } )

df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000

# plot those data points
# ==============================
fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])

# Now add on a line with a fixed slope of 0.03
slope = 0.03

# A line with a fixed slope can intercept the axis
# anywhere so we're going to have it go through 0,0
x_0 = 0
y_0 = 0

# And we'll have the line stop at x = 5000
x_1 = 5000
y_1 = slope (x_1 - x_0) + y_0

# Draw these two points with big triangles to make it clear
# where they lie
ax.scatter([x_0, x_1], [y_0, y_1], marker='^', s=150, c='r')

# And now connect them
ax.plot([x_0, x_1], [y_0, y_1], c='r')    

plt.show()

2 个答案:

答案 0 :(得分:4)

使用boundaryy_1给出的直线方程可以找到值slope

y_0

产生以下图表:

enter image description here

为了绘制多条线,首先创建一个将使用的渐变数组/列表,然后按照相同的步骤操作:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'Age': np.random.rand(25) * 160})
df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000

fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])

slope = 0.03
x_0 = 0
y_0 = 0
x_1 = 5000
y_1 = (slope * x_1) + y_0  # equation of a straight line: y = mx + c

ax.plot([x_0, x_1], [y_0, y_1], marker='^', markersize=10, c='r')

plt.show()

这产生了下图:

enter image description here

答案 1 :(得分:1)

我刚刚修改了你的代码。基本上你需要的是一个分段函数。在某个值下,你有不同的斜率,但最终都是3000,之后斜率只有0。

情节如下:

enter image description here

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# simulate some artificial data
# =====================================
df = pd.DataFrame( { 'Age' : np.random.rand(25) * 160 } )

df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000

# plot those data points
# ==============================
fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])

# Now add on a line with a fixed slope of 0.03
#slope1 = -0.03
slope1 = np.arange(-0.05, 0, 0.01)
slope2 = 0

# A line with a fixed slope can intercept the axis
# anywhere so we're going to have it go through 0,0
x_0 = 0
y_1 = 0

# And we'll have the line stop at x = 5000
for slope in slope1:
    x_1 = 3000
    y_0 = y_1 - slope * (x_1 - x_0)
    ax.plot([x_0, x_1], [y_0, y_1], c='r')

x_2 = 5000
y_2 = slope2 * (x_2 - x_1) + y_1

# Draw these two points with big triangles to make it clear
# where they lie
ax.scatter([x_0, x_1], [y_0, y_1], marker='^', s=150, c='r')

# And now connect them
ax.plot([x_1, x_2], [y_1, y_2], c='r')    

plt.show()