熊猫:与MultiColumn一起分组

时间:2016-03-14 14:48:54

标签: python pandas

我有一个带有multiColumns的数据框。它非常大,所以这里有一些信息:

In [73]: test.shape
Out[73]: (83, 82573)

这是第一行/列

first                    senator      words                                    \
second                           000003198s 000s 000th 001st 002nd 00a 0157h7   
(property, partyCode)                                                           
200                     sessions          0    0     0     0     0   0      0   
200                       shelby          1    0     0     0     0   0      0   
200                    murkowski          0    1     0     0     0   0      0   
200                      stevens          0    1     0     0     0   0      0   
200                          kyl          0    0     0     0     0   0      0   

现在我想按索引进行分组,并汇总每个特定单词的数字。我试过了:

In [88]: test.groupby(test.index)['words'].sum()
Out[88]: 
(property, partyCode)
100    1016.583333
200    1476.333333
Name: words, dtype: float64

错误轴的总和。 agg()的使用没有帮助。我如何得到我想要的输出?

期望的输出:

                         000003198s 000s 000th 001st 002nd 00a 0157h7 
(property, partyCode)
100    1016.583333                0    0     0     0     0   0      0
200    1476.333333                1    2     0     0     0   0      0

有关结构的更多数据:

我如何进入我的数据框:我拿这个

first     senator      words                                                 \
second            000003198s 000s 000th 001st 002nd 00a 0157h7 1000s 1000th   
0        sessions          0    0     0     0     0   0      0     0      0   
1          shelby          0    0     0     0     0   0      0     0      0   
2       murkowski          0    0     0     0     0   0      0     0      0   
3         stevens          0    0     0     0     0   0      0     0      0   
4             kyl          0    0     0     0     0   0      0     0      0   

它还有以下(多列)列:

In [132]: df['property', 'partyCode'].head()
Out[132]: 
0    200
1    200
2    200
3    200
4    200

然后我设置

test = df.set_index(('property', 'partyCode'))

1 个答案:

答案 0 :(得分:2)

您可以尝试concat

df2 = df.groupby(df.index).sum()
#remove first level of multiindex in columns
df2.columns = df2.columns.droplevel(0)
print df2
second                 000003198s  000s  000th  001st  002nd  00a  0157h7
(property, partyCode)                                                    
100                             0     0      0      0      1    0       0
200                             1     0      0      1      0    0       1

#does not work for me
df1 =  df.groupby(df.index)['words'].sum()
print df1
     (property, partyCode)
100                      1
200                      3

print pd.concat([df1['words'], df2], axis=1)
     (property, partyCode)  000003198s  000s  000th  001st  002nd  00a  0157h7
100                      1           0     0      0      0      1    0       0
200                      3           1     0      0      1      0    0       1

编辑:df1 = df.groupby(df.index)['words'].sum()对我不起作用。

对我来说,工作加倍sum

df1 = df.groupby(df.index).sum().sum(axis=1)
df1.name = 'words'
print df1
(property, partyCode)
100    1
200    3
Name: words, dtype: int64