在pandas 0.18.1中,python 2.7.6:
想象一下,我们有下表:
var arrays = [[0,1,2], [2,0,1], [1,2,0], [0,1,2]];
var result = [].concat.apply([], arrays);
console.log(result);
我们正试图以下列格式获得日历年平均值
Preferences -> Appearance & Behavior > System settings > Notifications
注意:TYPE是其他信息的字符串列。这里我们只有两种类型的'TYPE':'A'和'B'
如果我们尝试以下操作,则会删除“AREA”列名称,并且仅在第一种情况下显示ID = 1。
ID,FROM_YEAR,FROM_MONTH,AREA
1,2015,1,200
1,2015,2,200
1,2015,3,200
1,2015,4,200
1,2015,5,200
1,2015,6,200
1,2015,7,200
1,2015,8,200
1,2015,9,200
1,2015,10,200
1,2015,11,200
1,2015,12,200
1,2016,1,100
1,2016,2,100
1,2016,3,100
1,2016,4,100
1,2016,5,100
1,2016,6,100
1,2016,7,100
1,2016,8,100
1,2016,9,100
1,2016,10,100
1,2016,11,100
1,2016,12,100
它返回:
ID,FROM_YEAR,TYPE,AREA
1,2015,A,200
1,2016,A,100
1,2015,B,200
1,2016,B,100
如果我们尝试以下方法:
AREA_CY=df.groupby(['ID','FROM_YEAR'])['AREA'].mean()
它返回:
ID,FROM_YEAR,
1,2015,200
,2016,100
,2015,200
,2016,100
任何一位大师能开导吗?谢谢!
答案 0 :(得分:1)
试试这个:
P123
说明:
让我们制作In [102]: x = df.groupby(['ID','FROM_YEAR'])['AREA'].mean().reset_index(name='AREA')
In [103]: y = pd.DataFrame({'TYPE':['A','B']})
In [104]: x
Out[104]:
ID FROM_YEAR AREA
0 1 2015 200
1 1 2016 100
In [105]: y
Out[105]:
TYPE
0 A
1 B
In [106]: x.assign(key=0).merge(y.assign(key=0), on='key').drop('key', 1)
Out[106]:
ID FROM_YEAR AREA TYPE
0 1 2015 200 A
1 1 2015 200 B
2 1 2016 100 A
3 1 2016 100 B
和x
DF的cartesian product(AKA全外连接):
y