关于列总和的不规则分箱

时间:2016-06-27 14:26:44

标签: python pandas dataframe binning

我想根据另一列的总和在pandas中存储一个数据帧。

我有以下数据框:

time    variable    frequency
2           7         7
3           12        2
4           13        3
6           15        4
6           18        4
6           3         1
10          21        2
11          4         5
13          6         5
15          17        6
17          5         4

我想将数据分类,以便每组包含最小总频率10,并输出平均时间和总变量和总频率。

avg time    total variable  total frequency
3                 32             12
7                 57             11
12                10             10
16                22             10

非常感谢任何帮助

1 个答案:

答案 0 :(得分:0)

有点蛮力会让你走得很远。

import numpy as np

data = ((2, 7, 7),
        (3, 12, 2),
        (4, 13, 3),
        (6, 15, 4),
        (6, 18, 4),
        (6, 3, 1),
        (10, 21, 2),
        (11, 4, 5),
        (13, 6, 5),
        (15, 17, 6),
        (17, 5, 4))

freq = [data[i][2] for i in range(len(data))]
variable = [data[i][1] for i in range(len(data))]
time = [data[i][0] for i in range(len(data))]

freqcounter = 0
timecounter = 0
variablecounter = 0
counter = 0

freqlist = []
timelist = []
variablelist = []

for k in range(len(data)):
    freqcounter += freq[k]
    timecounter += time[k]
    variablecounter += variable[k]
    counter += 1
    if freqcounter >= 10:
        freqlist.append(freqcounter)
        timelist.append(timecounter/counter)
        variablelist.append(variablecounter)
        freqcounter = 0
        timecounter = 0
        variablecounter = 0
        counter = 0

print(timelist)
print(variablelist)        
print(freqlist)