将随机数据按其值合并到相等数据点的组中

时间:2017-09-15 14:42:17

标签: python

我得到了一个2列的数据帧(数量和价格),我想基于卷列创建20个bin,每个bin中的数据量相等。

即。如果我得到音量= [1,6,8,2,6,9,3,6]和4个箱子,我想将数据切换到第1箱:1:2,第2:3:6,第3:6: 8,4:8:9

然后查找每个箱内的平均交易量和价格,并绘制交易量(x轴)与价格(y轴)的图表

间隔不需要等间隔。我想在每个区间中拥有相同数量的数据并确定每个区间的范围,然后在每个区间内找到数据的平均值并绘制它

data = df['Volume']

discrete_dat, cutoff = discretize(dat, 20)
myList = sorted(set(cutoff))
Cutoff = np.asarray(myList)

df_2 = pd.DataFrame({'X' : fd['Volume'], 'Y' : df['dMidP']})  #we build a dataframe from the data
data_cut = pd.cut(data,Cutoff)            #we cut the data following the bins               #we cut the data following the bins     
grp = df_2.groupby(by = data_cut)        #we group the data by the cut
ret = grp.aggregate(np.mean)         #we produce an aggregate representation (mean) of each bin

plt.loglog(df['Volume'],df['dMidP'],'o')

plt.loglog(ret.X,ret.Y,'r-')
plt.title('Price Impact (Sell)')
plt.xlabel('Volume')
plt.ylabel('dMidP')

plt.show()

我的原始数据和输出图 enter image description here

enter image description here

然而,当我使用计数器功能时,它会返回以下内容,表示每个间隔中的数据点数量不同。

Counter({Interval(0.41299999999999998, 0.46400000000000002, closed='right'): 2029,
     Interval(0.877, 0.92800000000000005, closed='right'): 543,
     Interval(0.050999999999999997, 0.069599999999999995, closed='right'): 93,
     Interval(0.60299999999999998, 0.71399999999999997, closed='right'): 99,
     Interval(0.46400000000000002, 0.496, closed='right'): 93,
     Interval(0.092799999999999994, 0.125, closed='right'): 111,
     Interval(0.125, 0.14799999999999999, closed='right'): 86,
     Interval(0.0092800000000000001, 0.018599999999999998, closed='right'): 101,
     Interval(0.53800000000000003, 0.60299999999999998, closed='right'): 99,
     Interval(0.14799999999999999, 0.186, closed='right'): 108,
     Interval(0.018599999999999998, 0.023199999999999998, closed='right'): 102,
     Interval(0.186, 0.23200000000000001, closed='right'): 134,
     Interval(3.246, 4.2670000000000003, closed='right'): 85,
     Interval(0.496, 0.53800000000000003, closed='right'): 103,
     Interval(1.391, 1.716, closed='right'): 86,
     Interval(0.26400000000000001, 0.32500000000000001, closed='right'): 104,
     nan: 243,
     Interval(0.23200000000000001, 0.26400000000000001, closed='right'): 60,
     Interval(0.032500000000000001, 0.046399999999999997, closed='right'): 186,
     Interval(0.00464, 0.0092800000000000001, closed='right'): 87,
     Interval(0.023199999999999998, 0.032500000000000001, closed='right'): 74,
     Interval(0.71399999999999997, 0.877, closed='right'): 101,
     Interval(0.97399999999999998, 1.1359999999999999, closed='right'): 92,
     Interval(4.2670000000000003, 6.3120000000000003, closed='right'): 100,
     Interval(0.046399999999999997, 0.050999999999999997, closed='right'): 33,
     Interval(1.716, 1.855, closed='right'): 145,
     Interval(0.069599999999999995, 0.092799999999999994, closed='right'): 97,
     Interval(1.1359999999999999, 1.391, closed='right'): 319,
     Interval(2.319, 2.7829999999999999, closed='right'): 114,
     Interval(0.32500000000000001, 0.41299999999999998, closed='right'): 98,
     Interval(0.92800000000000005, 0.97399999999999998, closed='right'): 72,
     Interval(2.7829999999999999, 3.246, closed='right'): 75,
     Interval(2.1429999999999998, 2.319, closed='right'): 128,
     Interval(1.855, 2.1429999999999998, closed='right'): 56})

0 个答案:

没有答案