假设我有以下数据集
table = [[datetime.datetime(2015, 1, 1), 1, 0.5],
[datetime.datetime(2015, 1, 27), 1, 0.5],
[datetime.datetime(2015, 1, 31), 1, 0.5],
[datetime.datetime(2015, 2, 1), 1, 2],
[datetime.datetime(2015, 2, 3), 1, 2],
[datetime.datetime(2015, 2, 15), 1, 2],
[datetime.datetime(2015, 2, 28), 1, 2],
[datetime.datetime(2015, 3, 1), 1, 3],
[datetime.datetime(2015, 3, 17), 1, 3],
[datetime.datetime(2015, 3, 31), 1, 3]]
df = pd.DataFrame(table, columns=['Date', 'Id', 'Value'])
现在,我想找到每个月的最后一个值,将其按月移动到下个月的值,最后获取这些值的累积乘积。对上述数据执行此过程应导致(执行每个步骤):
查找每个月的最后一个条目并按月移动它们将导致
Date Id Value Temp
0 2015-01-01 1 0.5 NaN
1 2015-01-27 1 0.5 NaN
2 2015-01-31 1 0.5 NaN
3 2015-02-01 1 2.0 0.5
4 2015-02-03 1 2.0 0.5
5 2015-02-15 1 2.0 0.5
6 2015-02-28 1 2.0 0.5
7 2015-03-01 1 3.0 2.0
8 2015-03-17 1 3.0 2.0
9 2015-03-31 1 3.0 2.0
使用NaN
填充1
,获取累积产品,然后删除temp
将导致
Date Id Value Result
0 2015-01-01 1 0.5 1
1 2015-01-27 1 0.5 1
2 2015-01-31 1 0.5 1
3 2015-02-01 1 2.0 0.5
4 2015-02-03 1 2.0 0.5
5 2015-02-15 1 2.0 0.5
6 2015-02-28 1 2.0 0.5
7 2015-03-01 1 3.0 1.0
8 2015-03-17 1 3.0 1.0
9 2015-03-31 1 3.0 1.0
我希望这很清楚。如果有人想知道为什么在地球上我想这样做是因为我有MTD数据,需要重新采样。谢谢,Tingis。
修改 每月的条目数是"随机",因为它们可以是一个月或更短(业务数据......)
答案 0 :(得分:1)
以下代码并不假设您每个月只有两行。我们的想法是首先进行分组计算,然后使用.reindex()
填充一些NaN并使用向后填充填充那些NaN,因为我们已经获得了每月最后一个条目的值。
# your data
# ==================================
import pandas as pd
import datetime
table = [[datetime.datetime(2015, 1, 1), 1, 0.5],
[datetime.datetime(2015, 1, 31), 1, 0.5],
[datetime.datetime(2015, 2, 1), 1, 2],
[datetime.datetime(2015, 2, 28), 1, 2],
[datetime.datetime(2015, 3, 1), 1, 3],
[datetime.datetime(2015, 3, 31), 1, 3]]
df = pd.DataFrame(table, columns=['Date', 'Id', 'Value'])
# better to set Date column to index
df = df.set_index('Date')
print(df)
Id Value
Date
2015-01-01 1 0.5
2015-01-31 1 0.5
2015-02-01 1 2.0
2015-02-28 1 2.0
2015-03-01 1 3.0
2015-03-31 1 3.0
# processing
# =================================================
# get last entry from each month
df_temp = df.groupby(lambda idx: idx.month).tail(1)
# do the cumprod, reindex to have the same index as original df, backward fill
df['Result'] = df_temp['Value'].shift(1).fillna(1).cumprod().reindex(df.index).fillna(method='bfill')
print(df)
Id Value Result
Date
2015-01-01 1 0.5 1.0
2015-01-31 1 0.5 1.0
2015-02-01 1 2.0 0.5
2015-02-28 1 2.0 0.5
2015-03-01 1 3.0 1.0
2015-03-31 1 3.0 1.0
关于后续问题:
# your data
# ==================================
import pandas as pd
import datetime
table = [[datetime.datetime(2015, 1, 1), 1, 0.5],
[datetime.datetime(2015, 1, 27), 1, 0.5],
[datetime.datetime(2015, 1, 31), 1, 0.5],
[datetime.datetime(2015, 2, 1), 1, 2],
[datetime.datetime(2015, 2, 3), 1, 2],
[datetime.datetime(2015, 2, 15), 1, 2],
[datetime.datetime(2015, 2, 28), 1, 2],
[datetime.datetime(2015, 3, 1), 1, 3],
[datetime.datetime(2015, 3, 17), 1, 3],
[datetime.datetime(2015, 3, 31), 1, 3]]
df1 = pd.DataFrame(table, columns=['Date', 'Id', 'Value'])
df2 = df1.copy()
df2.Id = 2
df = df1.append(df2)
# better to set Date column to index
df = df.set_index('Date')
print(df)
Id Value
Date
2015-01-01 1 0.5
2015-01-27 1 0.5
2015-01-31 1 0.5
2015-02-01 1 2.0
2015-02-03 1 2.0
2015-02-15 1 2.0
2015-02-28 1 2.0
2015-03-01 1 3.0
2015-03-17 1 3.0
2015-03-31 1 3.0
2015-01-01 2 0.5
2015-01-27 2 0.5
2015-01-31 2 0.5
2015-02-01 2 2.0
2015-02-03 2 2.0
2015-02-15 2 2.0
2015-02-28 2 2.0
2015-03-01 2 3.0
2015-03-17 2 3.0
2015-03-31 2 3.0
def my_func(group):
# get last entry from each month
df_temp = group.groupby(lambda idx: idx.month).tail(1)
# do the cumprod, reindex to have the same index as original df
group['Result'] = df_temp['Value'].shift(1).fillna(1).cumprod().reindex(group.index).fillna(method='bfill')
return group
df.groupby('Id').apply(my_func)
Id Value Result
Date
2015-01-01 1 0.5 1.0
2015-01-27 1 0.5 1.0
2015-01-31 1 0.5 1.0
2015-02-01 1 2.0 0.5
2015-02-03 1 2.0 0.5
2015-02-15 1 2.0 0.5
2015-02-28 1 2.0 0.5
2015-03-01 1 3.0 1.0
2015-03-17 1 3.0 1.0
2015-03-31 1 3.0 1.0
2015-01-01 2 0.5 1.0
2015-01-27 2 0.5 1.0
2015-01-31 2 0.5 1.0
2015-02-01 2 2.0 0.5
2015-02-03 2 2.0 0.5
2015-02-15 2 2.0 0.5
2015-02-28 2 2.0 0.5
2015-03-01 2 3.0 1.0
2015-03-17 2 3.0 1.0
2015-03-31 2 3.0 1.0