我有以下数据集,其中提供了消费者购买并转售产品的日期:
data = [['01/01/2000', '06/03/2000'],
['12/03/2000', '15/08/2000'],
['12/04/2000',np.nan]]
df = pd.DataFrame(data, columns = ['Date_buy', 'Date_sell'])
Date_buy Date_sell
0 01/01/2000 06/03/2000
1 12/03/2000 15/08/2000
2 12/04/2000 NaN
我需要将其转换为描述买卖动态的买卖定时事件格式
我要创建的最终数据框应如下所示:
Date_buy Date_sell m_1 m_2 m_3 m_4 m_5 m_6 m_7 ...
0 01/01/2000 06/03/2000 0 0 1 1 1 1 1
1 12/03/2000 15/08/2000 0 0 0 0 0 1 1
2 12/04/2000 NaN 0 0 0 0 0 0 0
必须有一种快速的方法来完成它,但我还没有!
答案 0 :(得分:1)
这不是最优雅的解决方案,但是您可以从中进行改进:
diff_func = lambda row: row['Date_sell'].month-row['Date_buy'].month + 12*(row['Date_sell'].year-row['Date_buy'].year)
df['months_diff'] = df.apply(diff_func, axis=1).fillna(0).astype(int) # count how many months between buy and sell
output_columns = ['m'+str(i+1) for i in range(12)]
df = df.join(pd.DataFrame(index = df.index, columns = ['m'+str(i) for i in range(12)], data=0))
for i in df.index:
df.loc[i,output_columns[:df.loc[i]['months_diff']]] = 1
答案 1 :(得分:1)
import numpy as np
import pandas as pd
data = [['01/01/2000', '06/03/2000'],
['12/03/2000', '15/08/2000'],
['12/04/2000',np.nan]]
df = pd.DataFrame(data, columns = ['Date_buy', 'Date_sell'])
df['Date_buy'] = pd.to_datetime(df['Date_buy'], format='%d/%m/%Y')
df['Date_sell'] = pd.to_datetime(df['Date_sell'], format='%d/%m/%Y')
df['date_diff'] = df.Date_sell.dt.month - df.Date_buy.dt.month
cols = [f'm_{x}' for x in range(1, int(df['date_diff'].max()))]
df2 = pd.DataFrame(columns=cols)
res = pd.concat([df, df2], sort=False)
for idx, val in res.date_diff.iteritems():
if np.isnan(val) != True:
for idx2 in range(len(cols)):
if idx2 <= val:
res.at[idx, f'm_{idx2}'] = 0
else:
res.at[idx, f'm_{idx2}'] = 1
res.loc[res['date_diff'].apply(np.isnan), cols] = 0
print(res)
结果:
Date_buy Date_sell date_diff m_1 m_2 m_3 m_4
0 2000-01-01 2000-03-06 2.0 0 0 1 1
1 2000-03-12 2000-08-15 5.0 0 0 0 0
2 2000-04-12 NaT NaN 0 0 0 0