如何识别列与上个月相比的值变化

时间:2018-04-20 04:49:58

标签: python pandas

我有一个包含数千条记录的数据集。基于关键列(A)与上个月相比,显示列(X)值的变化的最佳方法是什么。

以下是样本表。

+----------+---+-----+
|   Date   | A |  X  |
+----------+---+-----+
| Jan 2017 | z | 123 |
| Jan 2017 | y | 234 |
| Feb 2017 | w | 123 |
| Feb 2017 | z | 456 |
+----------+---+-----+

输出:

+----------+-----+-----------+
|   Date   |  X  | Changes   |
+----------+-----+-----------+
| Feb 2017 | 234 | Deleted   |
| Feb 2017 | 456 | Added     |
+----------+-----+-----------+

谢谢!

1 个答案:

答案 0 :(得分:0)

可能有一种更简单的方法,但这是一个解决方案:

In [1]: import pandas as pd
   ...: 
   ...: df = pd.DataFrame({'Date': ['Jan 2017', 'Jan 2017', 'Feb 2017', 'Feb 2017'],
   ...:                   'A': 'zywz', 'X': [123, 234, 123, 456]})
   ...: df = df[['Date', 'A', 'X']]
   ...: df['Date'] = pd.to_datetime(df['Date'])
   ...: df.set_index('Date', inplace=True)
   ...: df  # input dataframe
   ...:                   
Out[1]: 
               A    X
Date                 
2017-01-01  zywz  123
2017-01-01  zywz  234
2017-02-01  zywz  123
2017-02-01  zywz  456

In [2]: # cout X values per month
   ...: wdf = df.reset_index().groupby(['Date', 'X']).X.count().unstack(level='X')
   ...: wdf
   ...: 
Out[2]: 
X           123  234  456
Date                     
2017-01-01  1.0  1.0  NaN
2017-02-01  1.0  NaN  1.0

In [3]: # detect the changes
   ...: import numpy as np
   ...: def get_status(col):
   ...:     if np.isnan(col[0]) and col[1]:
   ...:         return 'Added'
   ...:     if col[0] and np.isnan(col[1]):
   ...:         return 'Deleted'
   ...:     return 'no change'
   ...:     
   ...: status = wdf.apply(get_status)
   ...: status.name = 'Changes'
   ...: 

In [4]: # back to df
   ...: # securely work on working dataframe to save initial `df`
   ...: wdf = df.join(status, on='X').reset_index()[['Date', 'X', 'Changes']]
   ...: wdf[wdf['Changes']!='no change'].set_index('Date')
   ...: 
Out[4]: 
              X  Changes
Date                    
2017-01-01  234  Deleted
2017-02-01  456    Added