我有一组沿着X和Y的点,我想在X上为小范围创建箱子,并计算每个箱子的百分位数以创建所有箱子的多项式回归拟合,并且具有连续百分点方法即可。 问题出在边缘。边缘区域上的点数较低,由于此问题,百分位数值会失真。
下面的图片和代码显示了所有内容的完成情况。您可以看到的百分位数值是根据99.8定义的最大值和4.0定义的最小值计算的:
import numpy as np
import matplotlib.pyplot as plt
###############################
degree = 8
step = 0.05
numPercUp = 99.8
numPercDown = 4.0
###############################
fig = plt.figure(figsize=(8, 6))
dataX = np.random.uniform(low=0.14, high=2.06, size=(1000))
dataY = np.random.uniform(low=50, high=550, size=(1000))
plt.scatter(dataX, dataY, c='b', s=5, marker="+", label="data")
xMin = np.min(dataX)
xMax = np.max(dataX)
print 'xMin: ', xMin
print 'xMax: ', xMax
xMin = (int(xMin / step)+1) * step
xMax = (int(xMax / step)+1) * step
print 'xMin: ', xMin
print 'xMax: ', xMax
bins = np.arange(xMin, xMax, step)
inds = np.digitize(dataX, bins) # http://stackoverflow.com/questions/2275924/how-to-get-data-in-a-histogram-bin
print 'bins: ', bins, bins[0], bins[-1], len(bins)
print 'inds: ', np.min(inds), np.max(inds), np.sum(inds == 0)
# Percentile coordinates
percX = np.arange(xMin, xMax+step, step) - step/2 # All bin X position centered on the bin
percUp = np.zeros(len(bins)+1)
percDown = np.zeros(len(bins)+1)
for i in range(len(bins)+1):
dataBin = dataY[inds == i]
percUp[i] = np.percentile(dataBin, numPercUp)
percDown[i] = np.percentile(dataBin, numPercDown)
print 'percX: ', percX
print 'percUp: ', percUp
plt.plot(percX, percUp, color='green', linestyle='-', linewidth=2, marker="o", markerfacecolor='red', markersize=5, label="Up perc.")
plt.plot(percX, percDown, color='green', linestyle='-', linewidth=2, marker="o", markerfacecolor='red', markersize=5, label="Down perc.")
# Polynomial Regression
z = np.polyfit(percX, percUp, degree)
f = np.poly1d(z)
x_new = np.linspace(0.1, 2.1, 50)
y_new = f(x_new)
plt.plot(x_new, y_new, 'r--')
z = np.polyfit(percX, percDown, degree)
f = np.poly1d(z)
x_new = np.linspace(0.1, 2.1, 50)
y_new = f(x_new)
plt.plot(x_new, y_new, 'r--')
# Frame specification
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Perc. Min/Max')
plt.grid()
plt.legend()
plt.axis([0, 2.2, 0, 600])
plt.xticks(np.arange(0, 2.21, 0.1))
plt.yticks(np.arange(0, 600+1, 50))
plt.show()
我找到了3种可能性,但它们都没有说服我:
答案 0 :(得分:4)
转换你的X,这样你就可以更加重视你关心的价值观;例如,取-log(2.1-X)
将给出接近0的基本线性响应和接近2的指数增长。使用此函数确定二进制数将更好地估计两个值附近的值。
让我们从生成一些虚拟数据开始:
X = np.linspace(0,2,50000)
Y = np.random.gamma(4, 0.1+(X**2)*(2-X)/(0.01+(2-X)))
plt.plot(X,Y,'.')
plt.margins(0.04)
定义一个函数来转换X
s及其逆:
def xfrm(X): return -np.log(2.05-np.array(X))
def ivrt(Y): return 2.05-np.exp(-np.array(Y))
然后我们可以得到直方图计数:
Xi = xfrm(X)
bins = np.linspace(np.min(Xi),np.max(Xi)+1e-5,201)
ii = np.digitize(Xi,bins)
pcts = np.array([np.percentile(Y[ii==i],[4,95]) for i in range(1,len(bins))])
并生成一些图表以确保其表现符合预期:
fig,axs = plt.subplots(2,figsize=(8,10))
mids = bins[1:] - np.diff(bins)/2
axs[0].plot(X,Y,'.',zorder=1)
axs[0].vlines(ivrt(mids),pcts[:,0],pcts[:,1],lw=1);
axs[0].margins(0.04)
axs[1].plot(Xi,Y,'.',zorder=1)
axs[1].vlines(mids,pcts[:,0],pcts[:,1],lw=1);
axs[1].margins(0.04)
f = np.poly1d(np.polyfit(mids, pcts[:,1], 8))
axs[0].plot(ivrt(mids), f(mids),lw=3)
axs[1].plot(mids, f(mids))
f = np.poly1d(np.polyfit(mids, pcts[:,0], 8))
axs[0].plot(ivrt(mids), f(mids),lw=3)
axs[1].plot(mids, f(mids));
上图是原始值,下图是转换后的值。垂直线显示用于生成拟合的值。
我想我可能会对这一点感到失望,但希望它有趣!