pandas获取平均列名和额外的字符串列

时间:2017-03-06 22:10:31

标签: python-2.7 pandas mean

在pandas 0.18.1中,python 2.7.6:

想象一下,我们有下表:

var arrays = [[0,1,2], [2,0,1], [1,2,0], [0,1,2]];
var result = [].concat.apply([], arrays);
console.log(result);

我们正试图以下列格式获得日历年平均值

Preferences -> Appearance & Behavior > System settings > Notifications

注意:TYPE是其他信息的字符串列。这里我们只有两种类型的'TYPE':'A'和'B'

如果我们尝试以下操作,则会删除“AREA”列名称,并且仅在第一种情况下显示ID = 1。

ID,FROM_YEAR,FROM_MONTH,AREA
1,2015,1,200
1,2015,2,200
1,2015,3,200
1,2015,4,200
1,2015,5,200
1,2015,6,200
1,2015,7,200
1,2015,8,200
1,2015,9,200
1,2015,10,200
1,2015,11,200
1,2015,12,200
1,2016,1,100
1,2016,2,100
1,2016,3,100
1,2016,4,100
1,2016,5,100
1,2016,6,100
1,2016,7,100
1,2016,8,100
1,2016,9,100
1,2016,10,100
1,2016,11,100
1,2016,12,100

它返回:

ID,FROM_YEAR,TYPE,AREA
1,2015,A,200
1,2016,A,100
1,2015,B,200
1,2016,B,100

如果我们尝试以下方法:

AREA_CY=df.groupby(['ID','FROM_YEAR'])['AREA'].mean()

它返回:

ID,FROM_YEAR,
1,2015,200
,2016,100
,2015,200
,2016,100

任何一位大师能开导吗?谢谢!

1 个答案:

答案 0 :(得分:1)

试试这个:

P123

说明:

让我们制作In [102]: x = df.groupby(['ID','FROM_YEAR'])['AREA'].mean().reset_index(name='AREA') In [103]: y = pd.DataFrame({'TYPE':['A','B']}) In [104]: x Out[104]: ID FROM_YEAR AREA 0 1 2015 200 1 1 2016 100 In [105]: y Out[105]: TYPE 0 A 1 B In [106]: x.assign(key=0).merge(y.assign(key=0), on='key').drop('key', 1) Out[106]: ID FROM_YEAR AREA TYPE 0 1 2015 200 A 1 1 2015 200 B 2 1 2016 100 A 3 1 2016 100 B x DF的cartesian product(AKA全外连接):

y