如何将数据帧分为点数不等的特定长度的bin?

时间:2018-11-23 05:19:20

标签: python-3.x pandas binning

我有一个数据框,我想将该数据框划分为等宽的条带(每个条带中的数据点数可能不相同)。我尝试了以下方法

df = pc13.sort_values(by = ['A'],  ascending=True)
df_temp = np.array_split(df, 20)

但是这种方法是将数据帧分为具有相同数量数据点的bin。取而代之的是,我想将数据帧划分为特定宽度的bin,每个bin中数据点的数量也可能不相同。

数据帧列A中的最小值是-0.04843731030699292,最大值是0.05417013917000033。我尝试上传整个数据框,但这是一个很大的文件。

1 个答案:

答案 0 :(得分:1)

您可以执行以下操作:

# create a random df
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))

# sort valeus
df = df.sort_values(by = ['A'],  ascending=True)

# use your code but on a transposed dataframe
new = np.array_split(df.T, 5) # split columns into 5 bins

# list comprehension to transposed dataframes
dfs = [new[i].T for i in range(len(new))]

更新

# random df
df = pd.DataFrame(np.random.randn(1000, 5), columns=list('ABCDE'))

# sort on A
df.sort_values('A', inplace=True)

# create bins
df['bin'] = pd.cut(df['A'], 20, include_lowest = True)

# group on bin
group = df.groupby('bin')

# list comprehension to split groups into list of dataframes 
dfs = [group.get_group(x) for x in group.groups]


[            A         B         C         D         E               bin
 218 -2.716093  0.833726 -0.771400  0.691251  0.162448  (-2.723, -2.413]
 207 -2.581388 -2.318333 -0.001467  0.035277  1.219666  (-2.723, -2.413]
 380 -2.499710  1.946709 -0.519070  1.653383  0.309689  (-2.723, -2.413]
 866 -2.492050  0.246500 -0.596392  0.872888  2.371652  (-2.723, -2.413]
 876 -2.469238 -0.156470 -0.841065 -1.248793 -0.489665  (-2.723, -2.413]
 314 -2.456308  0.630691 -0.072146  1.139697  0.663674  (-2.723, -2.413]
 310 -2.455353  0.075842  0.589515 -0.427233  1.207979  (-2.723, -2.413]
 660 -2.427255  0.890125 -0.042716 -1.038401  0.651324  (-2.723, -2.413],
             A         B         C         D         E              bin
 571 -2.355430  0.383794 -1.266575 -1.214833 -0.862611  (-2.413, -2.11]
 977 -2.354416 -1.964189  0.440376  0.028032 -0.181360  (-2.413, -2.11]
 83  -2.276908  0.288462  0.370555 -0.546359 -2.033892  (-2.413, -2.11]
 196 -2.213729 -1.087783 -0.592884  1.233886  1.051164  (-2.413, -2.11]
 227 -2.146631  0.365183 -0.095293 -0.882414  0.385117  (-2.413, -2.11]
 39  -2.136800 -1.150065  0.180182 -0.424071  0.040370  (-2.413, -2.11],
             A         B         C         D         E              bin
 104 -2.108961 -0.396602 -1.014224 -1.277124  0.001030  (-2.11, -1.806]
 360 -2.098928  1.093483  1.438421 -0.980215  0.010359  (-2.11, -1.806]
 530 -2.088592  1.043201 -0.522468  0.482176 -0.680166  (-2.11, -1.806]
 158 -2.062759  2.070387  2.124621 -2.751532  0.674055  (-2.11, -1.806]
 971 -2.053039  0.347577 -0.498513  1.917305 -1.746493  (-2.11, -1.806]
 658 -2.002482 -1.222292 -0.398816  0.279228 -1.485782  (-2.11, -1.806]
 90  -1.985261  3.499251 -2.089028  1.238524 -1.781089  (-2.11, -1.806]
 466 -1.973640 -1.609920 -1.029454  0.809143 -0.228893  (-2.11, -1.806]
 40  -1.966016 -1.479240 -1.564966 -0.310133  1.338023  (-2.11, -1.806]
 279 -1.943666  0.762493  0.060038  0.449159  0.244411  (-2.11, -1.806]
 204 -1.940045  0.844901 -0.343691 -1.144836  1.385915  (-2.11, -1.806]
 780 -1.918548  0.212452  0.225789  0.216110  1.710532  (-2.11, -1.806]
 289 -1.897438  0.847664  0.689778 -0.454152 -0.747836  (-2.11, -1.806]
 159 -1.848425  0.477726  0.391384 -0.477804  0.168160  (-2.11, -1.806],
. . .