熊猫时间事件研究

时间:2019-09-25 09:01:12

标签: python pandas dataframe pivot-table reshape

我有以下数据集,其中提供了消费者购买并转售产品的日期:

data = [['01/01/2000', '06/03/2000'],
        ['12/03/2000', '15/08/2000'],
        ['12/04/2000',np.nan]]  

df = pd.DataFrame(data, columns = ['Date_buy', 'Date_sell'])

     Date_buy   Date_sell
0  01/01/2000  06/03/2000
1  12/03/2000  15/08/2000
2  12/04/2000         NaN

我需要将其转换为描述买卖动态的买卖定时事件格式

  • 更准确地说,我需要创建列来指示产品销售了多少个月后

我要创建的最终数据框应如下所示:

           Date_buy   Date_sell  m_1  m_2  m_3  m_4  m_5  m_6  m_7 ...
0        01/01/2000  06/03/2000    0    0    1    1    1    1    1
1        12/03/2000  15/08/2000    0    0    0    0    0    1    1
2        12/04/2000         NaN    0    0    0    0    0    0    0

必须有一种快速的方法来完成它,但我还没有!

2 个答案:

答案 0 :(得分:1)

这不是最优雅的解决方案,但是您可以从中进行改进:

diff_func = lambda row: row['Date_sell'].month-row['Date_buy'].month + 12*(row['Date_sell'].year-row['Date_buy'].year)
df['months_diff'] = df.apply(diff_func, axis=1).fillna(0).astype(int) # count how many months between buy and sell

output_columns = ['m'+str(i+1) for i in range(12)]
df = df.join(pd.DataFrame(index = df.index, columns = ['m'+str(i) for i in range(12)], data=0))

for i in df.index:
    df.loc[i,output_columns[:df.loc[i]['months_diff']]] = 1

答案 1 :(得分:1)

import numpy as np
import pandas as pd

data = [['01/01/2000', '06/03/2000'],
        ['12/03/2000', '15/08/2000'],
        ['12/04/2000',np.nan]]  

df = pd.DataFrame(data, columns = ['Date_buy', 'Date_sell'])

df['Date_buy'] = pd.to_datetime(df['Date_buy'], format='%d/%m/%Y')
df['Date_sell'] = pd.to_datetime(df['Date_sell'], format='%d/%m/%Y')

df['date_diff'] = df.Date_sell.dt.month - df.Date_buy.dt.month
cols = [f'm_{x}' for x in range(1, int(df['date_diff'].max()))]

df2 = pd.DataFrame(columns=cols)
res = pd.concat([df, df2], sort=False)

for idx, val in res.date_diff.iteritems():
  if np.isnan(val) != True:
    for idx2 in range(len(cols)):
      if idx2 <= val:
        res.at[idx, f'm_{idx2}'] = 0
      else:
        res.at[idx, f'm_{idx2}'] = 1

res.loc[res['date_diff'].apply(np.isnan), cols] = 0

print(res)

结果:

    Date_buy  Date_sell  date_diff  m_1  m_2  m_3  m_4
0 2000-01-01 2000-03-06        2.0    0    0    1    1
1 2000-03-12 2000-08-15        5.0    0    0    0    0
2 2000-04-12        NaT        NaN    0    0    0    0