熊猫每天创建数据框

时间:2020-06-06 05:30:51

标签: python pandas

我有一个如下所示的数据框:

Item_name | Start_date | Due_date | Value
Item 1      1/1/20       15/1/20    10
Item 1      7/1/20       29/2/20    15

我想从中计算一个新的数据框,其中的列是从start_date到Due_date之间的日期中的日期,每个日期列的值是该日期中列值的总和,如下所示:

Item_name | 1/1/20 | 2/1/20 | ... | 7/1/20 | ... | 15/1/20 | 16/1/20 | ... | 28/2/20 | 29/2/20
Item 1      10       10             25             25        15              15       15

(从1/1/20到6/1/20,项目1的值仅是10,从7/1/20到15/1/20,每天的总值是10 + 15) / p>

如何有效地创建它?

2 个答案:

答案 0 :(得分:1)

我的方法是手动建立日期范围并爆炸,然后我们可以按日期和项目分组:

(df.set_index(['Item_name', 'Value'])
   .assign(date_range=lambda x: [pd.date_range(s,d, freq='D') 
                                     for s,d in zip(x.Start_date, x.Due_date)])
   ['date_range'].explode()
  .reset_index()
  .groupby(['Item_name','date_range'])['Value']
  .sum()
  .unstack()
)

输出:

date_range  2020-01-01  2020-01-02  2020-01-03  2020-01-04  2020-01-05  \
Item_name                                                                
Item 1              10          10          10          10          10   

date_range  2020-01-06  2020-01-07  2020-01-08  2020-01-09  2020-01-10  ...  \
Item_name                                                               ...   
Item 1              10          25          25          25          25  ...   

date_range  2020-02-20  2020-02-21  2020-02-22  2020-02-23  2020-02-24  \
Item_name                                                                
Item 1              15          15          15          15          15   

date_range  2020-02-25  2020-02-26  2020-02-27  2020-02-28  2020-02-29  
Item_name                                                               
Item 1              15          15          15          15          15  

答案 1 :(得分:1)

您可以使用以下自我解释的代码获得所需的输出:

import pandas as pd
from datetime import timedelta

# Create DataFrame and format datetime to columns Start_date and Due_date
df = pd.DataFrame(\
[["Item 1","1/1/20","15/1/20",10],\
["Item 1","7/1/20","29/2/20",15]],\
columns=["Item_name","Start_date","Due_date","Value"])
df["Start_date"] =  pd.to_datetime(df["Start_date"], format="%d/%m/%y")
df["Due_date"] =  pd.to_datetime(df["Due_date"], format="%d/%m/%y")

# Function to create the date range series
def createDateRange(row):
    return pd.date_range(row["Start_date"],row["Due_date"],freq='d')

# Apply function to create Date ranges for all Items
df["dates"] = df.apply(createDateRange, axis=1)
# Explode dates and groupby dates and sum the values
df = df.explode("dates").groupby(["dates"])["Value"].sum()
#And finally you can arrange your data within a dataframe and transpose it
df = pd.DataFrame(df)
df.columns=(["Item 1"])
df = df.transpose()

结果是:

In [16]: df
Out[16]:
dates   2020-01-01  2020-01-02  2020-01-03  2020-01-04  2020-01-05  2020-01-06  2020-01-07  ...  2020-02-23  2020-02-24  2020-02-25  2020-02-26  2020-02-27  2020-02-28  2020-02-29
Item 1          10          10          10          10          10          10          25  ...          15          15          15          15          15          15          15

[1 rows x 60 columns]