根据groupby和binning将数据帧分为多个数据帧

时间:2018-09-20 11:03:30

标签: python pandas dataframe pandas-groupby binning

我在熊猫中有一个数据框,其中包含我想根据其ID(“正方形”)分类的信息。我想获得每个组的平均亮度,并基于此平均亮度,我想将数据框分为4类,并获得4个输出数据框。

示例数据框:

squares = pd.DataFrame({'square': {0: 1.0, 1: 1.0, 2: 2.0, 3: 2.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0},
                    'time': {0: 1.0, 1: 2.0, 2: 1.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 4.0, 7: 5.0 },
                    'x': {0: 243, 1: 293, 2: 189, 3: 189, 4: 176, 5: 374, 6: 111, 7: 239},
                    'y': {0: 233, 1: 436, 2: 230, 3: 233, 4: 203, 5: 394, 6: 171, 7: 284}, 
                    'brightness': {0: 1000, 1: 1200, 2: 4000, 3: 5000, 4: 2000, 5: 8000, 6: 1300, 7: 4300 }})

squares = squares.set_index('time')
squares


      brightness     square     x     y 
time
1.0     1000          1.0       243   233
2.0     1200          1.0       293   436
1.0     4000          2.0       189   230
2.0     5000          2.0       189   233
3.0     2000          5.0       176   203
3.0     6000          6.0       374   394 
4.0     1300          7.0       111   171
5.0     4300          8.0       239   284

所需的最终结果:

squares_1

      brightness     square     x     y 
time
1.0     1000          1.0       243   233
2.0     1200          1.0       293   436
3.0     2000          5.0       176   203
4.0     1300          7.0       111   171


squares_2

NaN


squares_3

      brightness     square     x     y 
time
1.0     4000          2.0       189   230
2.0     5000          2.0       189   233
5.0     4300          8.0       239   284


squares_4

      brightness     square     x     y 
time
3.0     6000          6.0       374   394 

我从以下内容开始:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

avg = squares.groupby('square')['brightness'].mean()
n, bins, patches = plt.hist(avg, bins = 4)
inds = np.digitize(avg, bins)

我不太确定如何继续。任何帮助表示赞赏!

1 个答案:

答案 0 :(得分:1)

您可以将GroupBy.transformmean一起用于与原始Series相同大小的新DataFrame,然后按cut进行合并并最后创建{{1} } s:

dictionary of DataFrame

squares = squares.set_index('time')

labs = [f'squares_{x+1}' for x in range(4)]
g = pd.cut(squares.groupby('square')['brightness'].transform('mean'), bins=4, labels=labs)
print (g)
time
1.0    squares_1
2.0    squares_1
1.0    squares_2
2.0    squares_2
3.0    squares_1
3.0    squares_4
4.0    squares_1
5.0    squares_2
Name: brightness, dtype: category
Categories (4, object): [squares_1 < squares_2 < squares_3 < squares_4]

dfs = dict(tuple(squares.groupby(g)))

print (dfs)
{'squares_1':       square    x    y  brightness
time                              
1.0      1.0  243  233        1000
2.0      1.0  293  436        1200
3.0      5.0  176  203        2000
4.0      7.0  111  171        1300, 'squares_2':       square    x    y  brightness
time                              
1.0      2.0  189  230        4000
2.0      2.0  189  233        5000
5.0      8.0  239  284        4300, 'squares_3': Empty DataFrame
Columns: [square, x, y, brightness]
Index: [], 'squares_4':       square    x    y  brightness
time                              
3.0      6.0  374  394        8000}