我有一个带有multiColumns的数据框。它非常大,所以这里有一些信息:
In [73]: test.shape
Out[73]: (83, 82573)
这是第一行/列
first senator words \
second 000003198s 000s 000th 001st 002nd 00a 0157h7
(property, partyCode)
200 sessions 0 0 0 0 0 0 0
200 shelby 1 0 0 0 0 0 0
200 murkowski 0 1 0 0 0 0 0
200 stevens 0 1 0 0 0 0 0
200 kyl 0 0 0 0 0 0 0
现在我想按索引进行分组,并汇总每个特定单词的数字。我试过了:
In [88]: test.groupby(test.index)['words'].sum()
Out[88]:
(property, partyCode)
100 1016.583333
200 1476.333333
Name: words, dtype: float64
错误轴的总和。 agg()
的使用没有帮助。我如何得到我想要的输出?
000003198s 000s 000th 001st 002nd 00a 0157h7
(property, partyCode)
100 1016.583333 0 0 0 0 0 0 0
200 1476.333333 1 2 0 0 0 0 0
我如何进入我的数据框:我拿这个
first senator words \
second 000003198s 000s 000th 001st 002nd 00a 0157h7 1000s 1000th
0 sessions 0 0 0 0 0 0 0 0 0
1 shelby 0 0 0 0 0 0 0 0 0
2 murkowski 0 0 0 0 0 0 0 0 0
3 stevens 0 0 0 0 0 0 0 0 0
4 kyl 0 0 0 0 0 0 0 0 0
它还有以下(多列)列:
In [132]: df['property', 'partyCode'].head()
Out[132]:
0 200
1 200
2 200
3 200
4 200
然后我设置
test = df.set_index(('property', 'partyCode'))
答案 0 :(得分:2)
您可以尝试concat
:
df2 = df.groupby(df.index).sum()
#remove first level of multiindex in columns
df2.columns = df2.columns.droplevel(0)
print df2
second 000003198s 000s 000th 001st 002nd 00a 0157h7
(property, partyCode)
100 0 0 0 0 1 0 0
200 1 0 0 1 0 0 1
#does not work for me
df1 = df.groupby(df.index)['words'].sum()
print df1
(property, partyCode)
100 1
200 3
print pd.concat([df1['words'], df2], axis=1)
(property, partyCode) 000003198s 000s 000th 001st 002nd 00a 0157h7
100 1 0 0 0 0 1 0 0
200 3 1 0 0 1 0 0 1
编辑:df1 = df.groupby(df.index)['words'].sum()
对我不起作用。
对我来说,工作加倍sum
:
df1 = df.groupby(df.index).sum().sum(axis=1)
df1.name = 'words'
print df1
(property, partyCode)
100 1
200 3
Name: words, dtype: int64