说我在Pandas中有两列。我想计算它们之间的偏移尊重组边界。
换句话说,假设我想要diff = A-B
,符号我想要:
df.loc[t,diff] = df.loc[t+1,A] - df.loc[t,B]
df
可以包含任何类型的索引(包括多索引)
如何为所有行执行此操作? df.loc[-1,diff]
的结果应为NaN
。
grouped = df.groupby(level='some_level')
for key in grouped.groups.keys():
this_group = grouped.get_group(key)
this_group['diff'] = this_group['A'].shift() - this_group['B']
但我明白了:
/Users/josh/anaconda/envs/py27/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
#!/Users/josh/anaconda/envs/py27/python.app/Contents/MacOS/python
grouped = df.groupby(level='some_group')
diff = grouped['A'].shift() - grouped['B']
返回
/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/ops.pyc in wrapper(left, right, name)
501 if hasattr(lvalues, 'values'):
502 lvalues = lvalues.values
--> 503 return left._constructor(wrap_results(na_op(lvalues, rvalues)),
504 index=left.index, name=left.name,
505 dtype=dtype)
NotImplementedError
答案 0 :(得分:2)
只需使用shift
:
df['diff']= df.A.shift() - df.B
默认设置为1,请参阅online docs
要应用于groupby,您可以执行以下操作:
df['diff'] = df.groupby('A').shift(1) - df['B']
示例:
In [48]:
df = pd.DataFrame({'A':[1,1,1,2,2,3,4,4,5,7], 'B':arange(10)})
print(df)
gp = df.groupby('A')
A B
0 1 0
1 1 1
2 1 2
3 2 3
4 2 4
5 3 5
6 4 6
7 4 7
8 5 8
9 7 9
[10 rows x 2 columns]
In [49]:
gp.head(10)
Out[49]:
A B
A
1 0 1 0
1 1 1
2 1 2
2 3 2 3
4 2 4
3 5 3 5
4 6 4 6
7 4 7
5 8 5 8
7 9 7 9
[10 rows x 2 columns]
In [52]:
gp['A'].shift(1) - df['B']
Out[52]:
0 NaN
1 0
2 -1
3 NaN
4 -2
5 NaN
6 NaN
7 -3
8 NaN
9 NaN
dtype: float64