我有一个如下所示的数据框:
Item_name | Start_date | Due_date | Value
Item 1 1/1/20 15/1/20 10
Item 1 7/1/20 29/2/20 15
我想从中计算一个新的数据框,其中的列是从start_date到Due_date之间的日期中的日期,每个日期列的值是该日期中列值的总和,如下所示:
Item_name | 1/1/20 | 2/1/20 | ... | 7/1/20 | ... | 15/1/20 | 16/1/20 | ... | 28/2/20 | 29/2/20
Item 1 10 10 25 25 15 15 15
(从1/1/20到6/1/20,项目1的值仅是10,从7/1/20到15/1/20,每天的总值是10 + 15) / p>
如何有效地创建它?
答案 0 :(得分:1)
我的方法是手动建立日期范围并爆炸,然后我们可以按日期和项目分组:
(df.set_index(['Item_name', 'Value'])
.assign(date_range=lambda x: [pd.date_range(s,d, freq='D')
for s,d in zip(x.Start_date, x.Due_date)])
['date_range'].explode()
.reset_index()
.groupby(['Item_name','date_range'])['Value']
.sum()
.unstack()
)
输出:
date_range 2020-01-01 2020-01-02 2020-01-03 2020-01-04 2020-01-05 \
Item_name
Item 1 10 10 10 10 10
date_range 2020-01-06 2020-01-07 2020-01-08 2020-01-09 2020-01-10 ... \
Item_name ...
Item 1 10 25 25 25 25 ...
date_range 2020-02-20 2020-02-21 2020-02-22 2020-02-23 2020-02-24 \
Item_name
Item 1 15 15 15 15 15
date_range 2020-02-25 2020-02-26 2020-02-27 2020-02-28 2020-02-29
Item_name
Item 1 15 15 15 15 15
答案 1 :(得分:1)
您可以使用以下自我解释的代码获得所需的输出:
import pandas as pd
from datetime import timedelta
# Create DataFrame and format datetime to columns Start_date and Due_date
df = pd.DataFrame(\
[["Item 1","1/1/20","15/1/20",10],\
["Item 1","7/1/20","29/2/20",15]],\
columns=["Item_name","Start_date","Due_date","Value"])
df["Start_date"] = pd.to_datetime(df["Start_date"], format="%d/%m/%y")
df["Due_date"] = pd.to_datetime(df["Due_date"], format="%d/%m/%y")
# Function to create the date range series
def createDateRange(row):
return pd.date_range(row["Start_date"],row["Due_date"],freq='d')
# Apply function to create Date ranges for all Items
df["dates"] = df.apply(createDateRange, axis=1)
# Explode dates and groupby dates and sum the values
df = df.explode("dates").groupby(["dates"])["Value"].sum()
#And finally you can arrange your data within a dataframe and transpose it
df = pd.DataFrame(df)
df.columns=(["Item 1"])
df = df.transpose()
结果是:
In [16]: df
Out[16]:
dates 2020-01-01 2020-01-02 2020-01-03 2020-01-04 2020-01-05 2020-01-06 2020-01-07 ... 2020-02-23 2020-02-24 2020-02-25 2020-02-26 2020-02-27 2020-02-28 2020-02-29
Item 1 10 10 10 10 10 10 25 ... 15 15 15 15 15 15 15
[1 rows x 60 columns]