我有两个具有日期的数据框。数据框为每个 Type 和每个 State 重复了日期,因为它是一个累积的求和框,如下所示:
Date State Type Value
2010-01-01 AK NUC 10
2010-02-01 AK NUC 10
2010-03-01 AK NUC 10
.
.
2010-01-01 CO NUC 2
2010-02-01 CO NUC 2
.
.
2010-01-01 AK WND 20
2010-02-01 AK WND 21
.
.
2018-08-01 .......
我需要做的是取第二个数据框,并根据“运行日期” 添加,将其添加到每个“类型” , “状态” ,然后根据“退休日期” (相对于原始的“日期” )减去 。第二个数据帧如下:
Operating Date Retirement Date Type State Value
2010-02-01 2010-04-01 NUC AK 1
2011-02-01 2014-02-01 NUC AK 2
2011-03-01 2016-03-01 NUC AK 10
.
.
.
2018-08-01 .......
例如,在 AK 上,输出将像这样添加和减去:
if AK(Date) == AK(Operating Date):
AK(Value, Date) = AK(Value, Date) + AK(Value, Operating Date)
elif AK(Date) == AK(Retirement Date):
AK(Value, Date) = AK(Value, Date) - AK(Value, Retirement Date)
else:
continue
实际的输出数据帧(仅用于AK'NUC')将是:
Date State Type Value
2010-01-01 AK NUC 10
2010-02-01 AK NUC 11
2010-03-01 AK NUC 11
2010-04-01 AK NUC 10
.
.
2011-01-01 AK NUC 10
2011-02-01 AK NUC 12
2011-03-01 AK NUC 22
2011-04-01 AK NUC 22
.
.
2016-01-01 AK NUC 22
2010-02-01 AK NUC 22
2010-03-01 AK NUC 12
2010-04-01 AK NUC 12
.
.
我该如何进行此类操作?
答案 0 :(得分:1)
下面的代码中使用的主要DataFrame
df
Date State Type Value
2010-01-01 AK NUC 10
2010-02-01 AK NUC 10
2010-03-01 AK NUC 10
2010-01-01 CO NUC 2
2010-02-01 CO NUC 2
2010-01-01 AK WND 20
2010-02-01 AK WND 21
您要添加到主更改,请注意,我用_
替换了空格delta
Operating_Date Retirement_Date Type State Value
2010-02-01 2010-04-01 NUC AK 1
2011-02-01 2014-02-01 NUC AK 2
2011-03-01 2016-03-01 NUC AK 10
攻击的计划是使用一个日期列,为此,我们需要将退休日期和工作日期合并到一列中,在使用退休日期时给该值一个负数,并为营业日期
#We first make a copy of the delta, we will call these cancellations and use the
#Retirement_Date and the value in negative
cx = delta.copy()
cx['Date']=cx['Retirement_Date']
cx.drop(['Operating_Date','Retirement_Date'],axis=1,inplace=True)
cx['Value'] *=-1
#In the original delta we assign operating date as the date value
delta['Date'] = delta['Operating_Date']
delta.drop(['Operating_Date','Retirement_Date'],axis=1,inplace=True)
#We then append the cancellations to the main delta frame and rename the values
#column to delta
delta = delta.append(cx)
delta.rename(columns={'Value':'Delta'},inplace=True)
我们现在有了一个带有一个日期列的数据框,其中包含我们要跟踪的每个日期的所有正向和负向变化
delta
Type State Delta Date
NUC AK 1 2010-02-01
NUC AK 2 2011-02-01
NUC AK 10 2011-03-01
NUC AK -1 2010-04-01
NUC AK -2 2014-02-01
NUC AK -10 2016-03-01
现在我们要做的就是将更改的累积值添加到主数据框
#we start by merging the data frames, as the column names are the same and we want to merge on all of them we just specify that it's an outer join
df = df.merge(delta,how='outer')
#if there are any new dates in the delta that aren't in the main dataframe we want to bring forth our cumulative sum
#but first we need to make sure we sort by date so the cumulative sum works
df.sort_values(['Type','State','Date'],inplace=True)
df['Value'] = df.groupby(['State','Type'])['Value'].ffill()
#for the dates where we have no changes we fill with zeros
df['Delta'].fillna(0,inplace=True)
#we can now add the cumilative sum of the delta to the values column
df['Value'] +=df.groupby(['State','Type'])['Delta'].cumsum().astype(int)
#and lastly we can remove the delta column again and we're done
del df['Delta']
最终的数据帧,希望是您所追求的
df
Date State Type Value
2010-01-01 AK NUC 10
2010-02-01 AK NUC 11
2010-03-01 AK NUC 11
2010-04-01 AK NUC 10
2011-02-01 AK NUC 12
2011-03-01 AK NUC 22
2014-02-01 AK NUC 20
2016-03-01 AK NUC 10
2010-01-01 CO NUC 2
2010-02-01 CO NUC 2
2010-01-01 AK WND 20
2010-02-01 AK WND 21