Question

问题：

我有一个类似以下的数据集：

import numpy as np 
x = np.arange(0,10000,0.5)
y = np.arange(x.size)/x.size

在日志-日志空间中进行绘制，如下所示：

import matplotlib.pyplot as plt
plt.loglog(x, y)
plt.show()

显然，此日志日志图中有很多冗余信息。我不需要10000点代表这个趋势。

我的问题是：如何对这些数据进行分类，以便在对数刻度的每个数量级上显示均匀数量的点？在每个数量级上，我想得到大约10分。因此，我需要将“ x”与按指数增长的bin大小进行bin，然后取与每个bin相对应的y所有元素的平均值。

尝试：

首先，我生成要用于x的垃圾箱。

# need a nicer way to do this.
# what if I want more than 10 bins per order of magnitude? 
bins = 10**np.arange(1,int(round(np.log10(x.max()))))
bins = np.unique((bins.reshape(-1,1)*np.arange(0,11)).flatten())

#array([    0,     10,    20,    30,    40,    50,    60,    70,    80,
#          90,   100,   200,   300,   400,   500,   600,   700,   800,
#         900,  1000,  2000,  3000,  4000,  5000,  6000,  7000,  8000,
#        9000, 10000])

第二，我找到x的每个元素对应的bin的索引：

digits = np.digitize(x, bins)

现在我可以真正使用帮助的部分。我想获取y中与每个bin对应的每个元素的平均值，然后绘制这些平均值与bin中点的关系：

# need a nicer way to do this.. is there an np.searchsorted() solution?
# this way is quick and dirty, but it does not scale with acceptable speed
averages = []
for d in np.unique(digits):
    mask = digits==d
    y_mean = np.mean(y[mask])
    averages.append(y_mean)
del mask, y_mean, d    

# now plot the averages within each bin against the center of each bin 
plt.loglog((bins[1:]+bins[:-1])/2.0, averages)
plt.show()

摘要： 有没有更流畅的方法可以做到这一点？如何在每个数量级而不是10个数量级上生成任意n点？

合并一列，并将（2，N）数组的另一列相加

0 个答案: