我有一个包含两列的pandas数据框:
ddf.head()
a b
0 3136 13280
1 3072 13312
2 3152 13296
3 3120 13248
4 3120 13200
我想计算同一列中连续元素之间的差异。现在,如果我一次只为一个列(ddf['a'].diff()
)执行此操作,它会按预期工作,但如果我尝试ddf.diff()
则会给出:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-68-6ff864856571> in <module>()
----> 1 ddf.diff()
/home/app/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in diff(self, periods)
4285 diffed : DataFrame
4286 """
-> 4287 new_data = self._data.diff(periods)
4288 return self._constructor(new_data)
4289
/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in diff(self, *args, **kwargs)
1287
1288 def diff(self, *args, **kwargs):
-> 1289 return self.apply('diff', *args, **kwargs)
1290
1291 def interpolate(self, *args, **kwargs):
/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
1267 applied = f(blk, *args, **kwargs)
1268 else:
-> 1269 applied = getattr(blk,f)(*args, **kwargs)
1270
1271 if isinstance(applied,list):
/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in diff(self, n)
423 def diff(self, n):
424 """ return block for the diff of the values """
--> 425 new_values = com.diff(self.values, n, axis=1)
426 return make_block(new_values, self.items, self.ref_items, fastpath=True)
427
/home/app/anaconda/lib/python2.7/site-packages/pandas/core/common.pyc in diff(arr, n, axis)
643 if arr.ndim == 2 and arr.dtype.name in _diff_special:
644 f = _diff_special[arr.dtype.name]
--> 645 f(arr, out_arr, n, axis)
646 else:
647 res_indexer = [slice(None)] * arr.ndim
/home/app/anaconda/lib/python2.7/site-packages/pandas/algos.so in pandas.algos.diff_2d_int16 (pandas/algos.c:91446)()
ValueError: Buffer dtype mismatch, expected 'float32_t' but got 'double'
答案 0 :(得分:7)
您可以使用:
>>> df - df.shift(1)
a b
0 NaN NaN
1 -64 32
2 80 -16
3 -32 -48
4 0 -48
但实际上,在我的机器上,df.diff()
正常工作:
>>> df.diff()
a b
0 NaN NaN
1 -64 32
2 80 -16
3 -32 -48
4 0 -48