pandas DataFrame:获取上一年缺少年份且无法转移的月份的值()

时间:2017-12-03 20:55:48

标签: python pandas pandas-groupby

我想使用缺少值的交易获得去年SALES_AMOUNT的值()

这是我的交易。

STORE TXN_YM    SALES_AMOUNT
A   201303  16793.14
A   201305  42901.61
A   201306  63059.72
A   201310  168471.43
A   201311  58570.72
A   201312  67526.71
A   201402  50649.07
A   201406  48819.97
A   201407  97100.77
A   201409  67778.40
A   201410  90327.52
A   201411  75703.12
A   201412  26098.50
A   201501  81429.36
A   201502  19539.85
A   201503  71727.66
A   201504  20117.79
A   201506  44252.19
A   201507  68578.82
A   201508  91483.39
A   201510  39220.87
A   201511  12224.11
A   201601  55425.74
A   201604  82550.66
A   201605  95772.93
A   201606  43794.49
A   201607  158287.16
A   201608  92568.03
A   201609  43136.43

预期产出

STORE   TXN_YM  SALES_AMOUNT    LY  
A   201303  16793.14    NaN 
A   201305  42901.61    NaN 
A   201306  63059.72    NaN 
A   201310  168471.43   NaN 
A   201311  58570.72    NaN 
A   201312  67526.71    NaN 
A   201402  50649.07    NaN 
A   201406  48819.97    63059.72    
A   201407  97100.77    NaN 
A   201409  67778.40    NaN 
A   201410  90327.52    168471.43   
A   201411  75703.12    58570.72    
A   201412  26098.50    67526.71    
A   201501  81429.36    NaN 
A   201502  19539.85    50649.07    
A   201503  71727.66    NaN <-- If shift() it will get 16793.14 of 201303  which is wrong
A   201504  20117.79    NaN 
A   201506  44252.19    48819.97    
A   201507  68578.82    97100.77    
A   201508  91483.39    NaN 
A   201510  39220.87    90327.52    
A   201511  12224.11    75703.12    
A   201601  55425.74    19539.85    
A   201604  82550.66    20117.79    
A   201605  95772.93    NaN <-- If shift() it will get 42901.61 of 201305 which is wrong
A   201606  43794.49    44252.19    
A   201607  158287.16   68578.82    
A   201608  92568.03    91483.39    
A   201609  43136.43    NaN 

我试图在pandas DataFrame: Get previous month value where there are missing transaction and cannot shift()中做一些类似我之前问过的问题,但它不起作用:(

我尝试将TXN_YM拆分为TXN_YEAR和TXN_MONTH,就像这样

STORE TXN_YM    TXN_YEAR    TXN_MONTH   SALES_AMOUNT
A   201303  2013    3   16793.14
A   201305  2013    5   42901.61
A   201306  2013    6   63059.72
A   201310  2013    10  168471.43
A   201311  2013    11  58570.72

到目前为止,这是我最好的

这是错误的,201503将取值201303而不是NaN

df["LY1"] = df.groupby(["STORE", "TXN_MONTH"])["SALES_AMOUNT"].shift()

我相信它会起作用,但事实并非如此,它显示了我根本无法获得的任意数字

def get_value_ly(x):
    y = x["SALES_AMOUNT"].shift() * x["TXN_YEAR"].diff().eq(1)
return y

df["LY"]  = df.groupby(["STORE", "TXN_MONTH"]).apply(lambda x: get_value_ly(x))

结果

STORE   TXN_YM SALES_AMOUNT LY
A   201303   16793.14        NaN
A   201305   42901.61   81429.36
A   201306   63059.72        NaN
A   201310  168471.43   50649.07
A   201311   58570.72        NaN
A   201312   67526.71        NaN
A   201402   50649.07        NaN
A   201406   48819.97   20117.79
A   201407   97100.77        NaN
A   201409   67778.40        NaN
A   201410   90327.52        NaN
A   201411   75703.12   63059.72
A   201412   26098.50   48819.97

我不知道为什么它不起作用:( 请帮我解决这个问题。

0 个答案:

没有答案