Pandas pivot操作用最后一行值替换None

时间:2017-11-14 09:53:33

标签: python pandas pivot

我有一个数据帧,其中idex为datetime,不同的id和不同的值 我有一个df:

data = [
{ 'datetime_start' : "2017-03-15 14:31:20.507", "id" : "usr_21",  "value": "-1.286452"},
{ 'datetime_start' : "2017-03-16 15:17:45.550", "id" : "usr_15",  "value": "-2.349203"},
{ 'datetime_start' : "2017-03-17 14:20:47.437", "id" : "usr_13",  "value": "-2.397038"},
{ 'datetime_start' : "2017-03-19 09:43:47.262", "id" : "usr_12",  "value": "-1.250512"},
{ 'datetime_start' : "2017-03-19 15:18:47.941", "id" : "usr_21",  "value": "-0.681998"},
{ 'datetime_start' : "2017-03-19 20:03:52.905", "id" : "usr_15",   "value": "-1.018452"},
{ 'datetime_start' : "2017-03-22 13:40:48.178", "id" : "usr_21",  "value": "-1.531373"},
{ 'datetime_start' : "2017-03-22 19:54:48.320", "id" : "usr_18",  "value": "-3.789466"},
{ 'datetime_start' : "2017-03-23 13:53:48.789", "id" : "usr_21",  "value": "-1.288360"},
{ 'datetime_start' : "2017-03-24 15:54:48.649", "id" : "usr_21",  "value": "0.213171"},
{ 'datetime_start' : "2017-03-25 17:53:48.422", "id" : "usr_13",  "value": "-2.020710"},
{ 'datetime_start' : "2017-03-26 06:10:48.197", "id" : "usr_12",  "value": "-1.484709"},
{ 'datetime_start' : "2017-03-15 14:31:20.507", "id" : "usr_21",  "value": "-1.286452"},
{ 'datetime_start' : "2017-03-16 15:18:45.550", "id" : "usr_18",  "value": "-2.349203"},
{ 'datetime_start' : "2017-03-17 14:18:47.437", "id" : "usr_11",  "value": "-2.397038"},
{ 'datetime_start' : "2017-03-19 09:48:47.262", "id" : "usr_15",  "value": "-1.250512"},
{ 'datetime_start' : "2017-03-19 15:18:47.941", "id" : "usr_21",  "value": "-0.681998"},
{ 'datetime_start' : "2017-03-19 20:03:52.905", "id" : "usr_13",  "value": "-1.018452"},
{ 'datetime_start' : "2017-03-22 13:53:48.178", "id" : "usr_21",  "value": "-1.531373"},
{ 'datetime_start' : "2017-03-22 19:53:48.320", "id" : "usr_18",  "value": "-3.789466"},
{ 'datetime_start' : "2017-03-23 13:53:48.789", "id" : "usr_21",  "value": "-1.288360"},
{ 'datetime_start' : "2017-03-24 15:53:48.649", "id" : "usr_11",  "value": "0.213171"},
{ 'datetime_start' : "2017-03-25 16:53:48.422", "id" : "usr_13",  "value": "-2.020710"},
{ 'datetime_start' : "2017-03-26 06:08:48.197", "id" : "usr_15",  "value": "-1.484709"}
]

df = pd.DataFrame(data)
df['datetime_start'] = pd.to_datetime(df['datetime_start'])

我希望将此数据表示为数据透视

table = pd.pivot_table(df, values='value', index='id'],
                     columns=['index'], aggfunc=np.sum)

im1 所以对于每个(id,datetime)我们都有一个值 如果没有价值,它就是无

是否有一种优雅的方法可以用此规则替换None值:

if value(id_i, datetime_i) == None :
    if value(id_i, datetime_i-1)  != 0 :
        value(id_i, datetime_i) = value(id_i, datetime_i-1)
    else:
        value(id_i, datetime_i) = 0

这种传播:

enter image description here

1 个答案:

答案 0 :(得分:1)

我认为您需要先将columns=['index']更改为columns='datetime_start',然后使用ffillfillnamethod='ffill'):

table = (pd.pivot_table(df, 
                        values='value', 
                        index='id', 
                        columns='datetime_start', 
                        aggfunc=np.sum)
           .ffill(axis=1, limit=1))

使用较少的数据进行编辑:

data = [
{ 'datetime_start' : "2017-03-15 14:31:20.507", "id" : "usr_21",  "value": "-1.286452"},
{ 'datetime_start' : "2017-03-16 15:17:45.550", "id" : "usr_15",  "value": "-2.349203"},
{ 'datetime_start' : "2017-03-17 14:20:47.437", "id" : "usr_13",  "value": "-2.397038"},
{ 'datetime_start' : "2017-03-19 09:43:47.262", "id" : "usr_12",  "value": "-1.250512"},
]

df = pd.DataFrame(data)
df['datetime_start'] = pd.to_datetime(df['datetime_start'])

table = (pd.pivot_table(df, 
                        values='value', 
                        index='id', 
                        columns='datetime_start', 
                        aggfunc=np.sum)
         )


print (table)
datetime_start 2017-03-15 14:31:20.507 2017-03-16 15:17:45.550  \
id                                                               
usr_12                            None                    None   
usr_13                            None                    None   
usr_15                            None               -2.349203   
usr_21                       -1.286452                    None   

datetime_start 2017-03-17 14:20:47.437 2017-03-19 09:43:47.262  
id                                                              
usr_12                            None               -1.250512  
usr_13                       -2.397038                    None  
usr_15                            None                    None  
usr_21                            None                    None  

如果之前的值不是None,则只替换一个值None - 然后添加参数limit

table1 = (pd.pivot_table(df, 
                        values='value', 
                        index='id', 
                        columns='datetime_start', 
                        aggfunc=np.sum)
           .ffill(axis=1, limit=1)
           )

print (table1)
datetime_start 2017-03-15 14:31:20.507 2017-03-16 15:17:45.550  \
id                                                               
usr_12                            None                    None   
usr_13                            None                    None   
usr_15                            None               -2.349203   
usr_21                       -1.286452               -1.286452   

datetime_start 2017-03-17 14:20:47.437 2017-03-19 09:43:47.262  
id                                                              
usr_12                            None               -1.250512  
usr_13                       -2.397038               -2.397038  
usr_15                       -2.349203                    None  
usr_21                            None                    None  

将所有NaN替换为之前的非NaNs,最后将所有NaN替换为0,删除limit并添加fillna(0)

table2 = (pd.pivot_table(df, 
                        values='value', 
                        index='id', 
                        columns='datetime_start', 
                        aggfunc=np.sum)
            .ffill(axis=1)
            .fillna(0)
           )

print (table2)
datetime_start 2017-03-15 14:31:20.507 2017-03-16 15:17:45.550  \
id                                                               
usr_12                               0                       0   
usr_13                               0                       0   
usr_15                               0               -2.349203   
usr_21                       -1.286452               -1.286452   

datetime_start 2017-03-17 14:20:47.437 2017-03-19 09:43:47.262  
id                                                              
usr_12                               0               -1.250512  
usr_13                       -2.397038               -2.397038  
usr_15                       -2.349203               -2.349203  
usr_21                       -1.286452               -1.286452