您好我有一个数据框,我正在尝试按索引添加和减去行。
首先是易于复制格式的数据:
data = [['Name1','Obj1','Ind1',10,5,3,6],['Name1','Obj1','Ind2',10,5,2,1],['Name1','Obj1','Ind3',10,5,5,2],['Name1','Obj2','Ind1',15,7,33,15],['Name1','Obj2','Ind2',15,7,15,9],['Name1','Obj2','Ind3',15,7,32,9]]
然后是数据帧:
>>> df = pd.DataFrame(data,columns=['Name','Object','Index','Const1','Const2','Method1','Method2'])
>>> df
Name Object Index Const1 Const2 Method1 Method2
0 Name1 Obj1 Ind1 10 5 3 6
1 Name1 Obj1 Ind2 10 5 2 1
2 Name1 Obj1 Ind3 10 5 5 2
3 Name1 Obj2 Ind1 15 7 33 15
4 Name1 Obj2 Ind2 15 7 15 9
5 Name1 Obj2 Ind3 15 7 32 9
这是一个截断的df,其中只有一个“Name”,但在真正的df中可能有很多。虽然“索引”仅限于几个值。在这种有限的情况下,我想通过“名称”和“对象”分组然后取Ind1-Ind2-Ind3
来操纵“方法”列。
我这样做的原始方式如下:
>>> for ind in ['Ind2','Ind3']:
... for meth in ['Method1','Method2']:
... df[meth][df['Index']==ind] *= -1
...
>>> df
Name Object Index Const1 Const2 Method1 Method2
0 Name1 Obj1 Ind1 10 5 3 6
1 Name1 Obj1 Ind2 10 5 -2 -1
2 Name1 Obj1 Ind3 10 5 -5 -2
3 Name1 Obj2 Ind1 15 7 33 15
4 Name1 Obj2 Ind2 15 7 -15 -9
5 Name1 Obj2 Ind3 15 7 -32 -9
df['Const1'] /= 3
df['Const2'] /= 3
>>> df.groupby(['Name','Object']).sum()
Const1 Const2 Method1 Method2
Name Object
Name1 Obj1 10 5 -4 3
Obj2 15 7 -14 -3
使用python pandas有更好的方法吗?
答案 0 :(得分:2)
假设您想要将Const1
和Const2
除以每组中的非空计数(以便在求和时保留其值):
In [20]: data = [['Name1','Obj1','Ind1',10,5,3,6],
....: ['Name1','Obj1','Ind2',10,5,2,1],
....: ['Name1','Obj1','Ind3',10,5,5,2],
....: ['Name1','Obj2','Ind1',10,5,33,15],
....: ['Name1','Obj2','Ind2',10,5,15,9],
....: ['Name1','Obj2','Ind3',10,5,32,9]]
In [21]: df = DataFrame(data,columns=['Name','Object','Index','Const1','Const2','Method1','Method2'])
In [22]: df
Out[22]:
Name Object Index Const1 Const2 Method1 Method2
0 Name1 Obj1 Ind1 10 5 3 6
1 Name1 Obj1 Ind2 10 5 2 1
2 Name1 Obj1 Ind3 10 5 5 2
3 Name1 Obj2 Ind1 10 5 33 15
4 Name1 Obj2 Ind2 10 5 15 9
5 Name1 Obj2 Ind3 10 5 32 9
In [23]: df.loc[df.Index.isin(['Ind2', 'Ind3']), ['Method1', 'Method2']] *= -1
In [24]: def plyr(df):
....: df = df.copy()
....: df['Const1'] /= float(df.Const1.count())
....: df['Const2'] /= float(df.Const2.count())
....: return df
....:
In [25]: df.groupby(['Name', 'Object']).apply(lambda x: plyr(x)._get_numeric_data().sum())
Out[25]:
Const1 Const2 Method1 Method2
Name Object
Name1 Obj1 10 5 -4 3
Obj2 10 5 -14 -3