熊猫DataFrame如何使用groupby()拆分和合并数据

时间:2018-07-14 03:52:23

标签: python dataframe reindex

我有这样的数据:(实际DataFrame结构的较小版本)

week day val 
1   0   8      
1   1   9
1   2   6
1   3   3
1   4   4
1   5   2
1   6   6
1   7   9
2   0   3
2   1   1
2   2   2
2   3   6
2   4   8
2   5   9
2   6   6
2   7   3
3   0   4
3   1   2
3   2   6
3   3   7
3   4   4
3   5   2
3   6   5
3   7   7
1   0   1
1   1   2
1   2   6
1   3   8
1   4   9
1   5   1
1   6   7
1   7   4
2   0   2
2   1   1
2   2   2
2   3   6
2   4   8
2   5   9
2   6   1
2   7   7
3   0   4
3   1   2
3   2   8
3   3   9
3   4   7
3   5   9
3   6   3
3   7   7

,而且,我想使用“星期”和“天”作为组键。就像我在下面所做的一样:

data.loc[:,wd_val] = data.groupby([data['week'],data['day']]).mean()

我遇到一个错误:

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long 
long'

所以,(1)“ longlong”是什么意思?

第二,我添加参数as_index:

data.loc[:,'wd_val']= 
data[['val']].groupby([data['week'],data['day']],as_index=False).mean()
data

但是,“ wd_val”的值为NaN:

week    day val wd_val
0   1   0   8   NaN
1   1   1   9   NaN
2   1   2   6   NaN
3   1   3   3   NaN

(2)为什么我弄错了?

第三,我使用下面的代码得到一个数据框:

temp = data[['val']].groupby([data['week'],data['day']]).mean()
temp

            val
week  day   
 1     1    5.5
       2    6.0
       3    5.5
       4    6.5
       5    1.5
       6    6.5
       7    6.5
  2    0    2.5
       1    1.0
       2    2.0

and,我想将index(“ week”和“ day”)切换到DataFrame的列中。我该怎么办?

2 个答案:

答案 0 :(得分:1)

输入:

data = pd.DataFrame([
    [1,0,0],
    [1,0,1],
    [1,1,0],
    [1,1,1],
    [1,2,0],
    [2,2,1],
    [2,2,2],
    [2,2,2]], columns=['week','day','val'])

尝试:

pd.merge(data, data.groupby(['week','day']).mean(),
    on=['week', 'day'], 
    suffixes=('_orig', '_wk_mean'))

输出:

   week  day  val_orig  val_wk_mean
0     1    0         0     0.500000
1     1    0         1     0.500000
2     1    1         0     0.500000
3     1    1         1     0.500000
4     1    2         0     0.000000
5     2    2         1     1.666667
6     2    2         2     1.666667
7     2    2         2     1.666667

long long是一种数据类型

类似的sql语句可能看起来像这样:

select A.week
    , A.day
    , A.val as val_orig
    , B.val_wk_mean from data as A
join (
    SELECT avg(val) as val_wk_mean
        , week
        , day
    from data
    group by week, day
) as B
on A.week=B.week
and A.day=B.day

另请参阅:

答案 1 :(得分:1)

IIUC,您认为您需要使用transform

df['wd_val'] = df.groupby(['week','day'])['val'].transform('mean')

输出:

    week  day  val  wd_val
0      1    0    8     4.5
1      1    1    9     5.5
2      1    2    6     6.0
3      1    3    3     5.5
4      1    4    4     6.5
5      1    5    2     1.5
6      1    6    6     6.5
7      1    7    9     6.5
8      2    0    3     2.5
9      2    1    1     1.0
10     2    2    2     2.0
11     2    3    6     6.0
12     2    4    8     8.0
13     2    5    9     9.0
14     2    6    6     3.5
15     2    7    3     5.0
16     3    0    4     4.0
17     3    1    2     2.0
18     3    2    6     7.0
19     3    3    7     8.0
20     3    4    4     5.5
21     3    5    2     5.5
22     3    6    5     4.0
23     3    7    7     7.0
24     1    0    1     4.5
25     1    1    2     5.5
26     1    2    6     6.0
27     1    3    8     5.5
28     1    4    9     6.5
29     1    5    1     1.5
30     1    6    7     6.5
31     1    7    4     6.5
32     2    0    2     2.5
33     2    1    1     1.0
34     2    2    2     2.0
35     2    3    6     6.0
36     2    4    8     8.0
37     2    5    9     9.0
38     2    6    1     3.5
39     2    7    7     5.0
40     3    0    4     4.0
41     3    1    2     2.0
42     3    2    8     7.0
43     3    3    9     8.0
44     3    4    7     5.5
45     3    5    9     5.5
46     3    6    3     4.0
47     3    7    7     7.0