使用pandas在python中计算滚动季度(替代?)

时间:2018-05-30 09:40:37

标签: python pandas datetime

Date, Brand, Indication,Geo, Type and values are column names 目前使用函数和日期时间戳计算滚动季度,下面使用的代码如下所示,执行代码需要一些时间来改变或修改代码RQ column is the rolling quarter column added ..

 import pandas as pd
 from pandas import ExcelWriter
 from pandas import ExcelFile
 import datetime

#***Date parsing using datetime.stptime function***
dateparse = lambda x: pd.datetime.strptime(x, '%m/%d/%Y')
df = pd.read_csv('Demo for MAt.csv', index_col=0,
             parse_dates=['Date'], date_parser=dateparse)
## importing data from csv file as dataframe
#Function to calculate the rolling sum for each record by date and other 
  levels

def RQ(x):
    ##Function checks whether the date is falling in the previous 3 months range 
    ##and sums up if it is in the range**
    RQS = df['Value'][
          (df.index >= x.name - datetime.timedelta(days = 62)) 
        & (df.index <= x.name) 
        & (df['Brand'] == x[0]) 
        & (df['Indication'] == x[1])
        & (df['Geo'] == x[2]) 
        & (df['Type'] == x[3])
    ]
    return RQS.sum()

##For each row the calculation is done using the apply function**
df['RQ'] = df.apply(RQ, axis=1)


#the below data frames has the input and expected output for a sample
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
inputdf = pd.DataFrame([['04/01/2016', 1,'A','National','Value',10], 
['05/01/2016', 1,'A','National','Value',20], ['06/01/2016', 
1,'A','National','Value',30]], columns=['Date', 
'Brand','Indication','Geo','Type','Value'])
print inputdf
outputdf = pd.DataFrame([['04/01/2016', 1,'A','National','Value',10,10], 
['05/01/2016', 1,'A','National','Value',20,30], ['06/01/2016', 
1,'A','National','Value',30,60]], columns=['Date', 
'Brand','Indication','Geo','Type','Value','RQ'])
print outputdf
#Input**Below input**
       Date  Brand Indication       Geo   Type  Value
0  04/01/2016      1          A  National  Value     10
1  05/01/2016      1          A  National  Value     20
2  06/01/2016      1          A  National  Value     30
## Expected output
       Date  Brand Indication       Geo   Type  Value  RQ
0  04/01/2016      1          A  National  Value     10  10
1  05/01/2016      1          A  National  Value     20  30
2  06/01/2016      1          A  National  Value     30  60

1 个答案:

答案 0 :(得分:0)

Date列转换为时间戳类型,如果尚未完成&amp;将其设为索引

df.Date = pd.to_datetime(df.Date)
df = df.set_index('Date')

使用其他维度对数据进行分组,并为每个组应用值的滚动总和。

DataFrame.rolling可以创建时间窗口,默认使用索引进行窗口化。如您在尝试中所做的那样,为窗口大小指定62D

df['RQ'] = df.groupby(list(df.columns[:-1].values)).Value.apply(lambda x: x.rolling('62D').sum())

此输出(带有样本数据):

            Brand Indication       Geo   Type  Value    RQ
Date
2016-04-01      1          A  National  Value     10  10.0
2016-05-01      1          A  National  Value     20  30.0
2016-06-01      1          A  National  Value     30  60.0