我得到了一个2列的数据帧(数量和价格),我想基于卷列创建20个bin,每个bin中的数据量相等。
即。如果我得到音量= [1,6,8,2,6,9,3,6]和4个箱子,我想将数据切换到第1箱:1:2,第2:3:6,第3:6: 8,4:8:9
然后查找每个箱内的平均交易量和价格,并绘制交易量(x轴)与价格(y轴)的图表
间隔不需要等间隔。我想在每个区间中拥有相同数量的数据并确定每个区间的范围,然后在每个区间内找到数据的平均值并绘制它
data = df['Volume']
discrete_dat, cutoff = discretize(dat, 20)
myList = sorted(set(cutoff))
Cutoff = np.asarray(myList)
df_2 = pd.DataFrame({'X' : fd['Volume'], 'Y' : df['dMidP']}) #we build a dataframe from the data
data_cut = pd.cut(data,Cutoff) #we cut the data following the bins #we cut the data following the bins
grp = df_2.groupby(by = data_cut) #we group the data by the cut
ret = grp.aggregate(np.mean) #we produce an aggregate representation (mean) of each bin
plt.loglog(df['Volume'],df['dMidP'],'o')
plt.loglog(ret.X,ret.Y,'r-')
plt.title('Price Impact (Sell)')
plt.xlabel('Volume')
plt.ylabel('dMidP')
plt.show()
然而,当我使用计数器功能时,它会返回以下内容,表示每个间隔中的数据点数量不同。
Counter({Interval(0.41299999999999998, 0.46400000000000002, closed='right'): 2029,
Interval(0.877, 0.92800000000000005, closed='right'): 543,
Interval(0.050999999999999997, 0.069599999999999995, closed='right'): 93,
Interval(0.60299999999999998, 0.71399999999999997, closed='right'): 99,
Interval(0.46400000000000002, 0.496, closed='right'): 93,
Interval(0.092799999999999994, 0.125, closed='right'): 111,
Interval(0.125, 0.14799999999999999, closed='right'): 86,
Interval(0.0092800000000000001, 0.018599999999999998, closed='right'): 101,
Interval(0.53800000000000003, 0.60299999999999998, closed='right'): 99,
Interval(0.14799999999999999, 0.186, closed='right'): 108,
Interval(0.018599999999999998, 0.023199999999999998, closed='right'): 102,
Interval(0.186, 0.23200000000000001, closed='right'): 134,
Interval(3.246, 4.2670000000000003, closed='right'): 85,
Interval(0.496, 0.53800000000000003, closed='right'): 103,
Interval(1.391, 1.716, closed='right'): 86,
Interval(0.26400000000000001, 0.32500000000000001, closed='right'): 104,
nan: 243,
Interval(0.23200000000000001, 0.26400000000000001, closed='right'): 60,
Interval(0.032500000000000001, 0.046399999999999997, closed='right'): 186,
Interval(0.00464, 0.0092800000000000001, closed='right'): 87,
Interval(0.023199999999999998, 0.032500000000000001, closed='right'): 74,
Interval(0.71399999999999997, 0.877, closed='right'): 101,
Interval(0.97399999999999998, 1.1359999999999999, closed='right'): 92,
Interval(4.2670000000000003, 6.3120000000000003, closed='right'): 100,
Interval(0.046399999999999997, 0.050999999999999997, closed='right'): 33,
Interval(1.716, 1.855, closed='right'): 145,
Interval(0.069599999999999995, 0.092799999999999994, closed='right'): 97,
Interval(1.1359999999999999, 1.391, closed='right'): 319,
Interval(2.319, 2.7829999999999999, closed='right'): 114,
Interval(0.32500000000000001, 0.41299999999999998, closed='right'): 98,
Interval(0.92800000000000005, 0.97399999999999998, closed='right'): 72,
Interval(2.7829999999999999, 3.246, closed='right'): 75,
Interval(2.1429999999999998, 2.319, closed='right'): 128,
Interval(1.855, 2.1429999999999998, closed='right'): 56})