使用pandas groupby根据工作日和周末填写时间序列

时间:2015-01-14 22:37:11

标签: datetime pandas group-by

感谢任何可以提供帮助的人:

我有一个半小时的年度时间序列。整个NaN值都散落着。我要做的是首先用同一个月内同一天的平均值填写NaN值。

这是我到目前为止所做的:

def fill_mean(VAH_data):                    
    # function which replaces NaN values with the mean of a particular grouping
    return VAH_data.fillna(VAH_data.mean())

VAH_data_filled =\
    VAH_data_rs.groupby([lambda x: x.month, lambda x: x.weekday(), lambda x: x.hour], 
                        group_keys=False).apply(fill_mean)

` 这填补了大多数,但仍然存在大小不等的差距。我想在分组中实现weekday()函数,以便在工作日和周末之间进行概括。

我找到了以下帖子: in pandas how can I groupby weekday() for a datetime column?

但不确定如何在分组内实现它。

我找到了让人们看的一种方式:

#Create copy of index column to be used to group day types
VAH_data_fill1['date_temp'] = VAH_data_fill1.index

#Create separate column that indicates specific day type
VAH_data_fill1['weekday'] =\
    VAH_data_fill1['date_temp'].apply(lambda x: x.weekday())

#Create a function to differentiate between weekdays and weekends
#Days are defined as: Monday = 0 to Sunday = 6
dayLog = []
def day_differentiate(VAH_data2):
    if VAH_data2 < 5:
        dayLog.append(1)
    else:
        dayLog.append(0)

#Apply differentiate function to sort weekdays and weekends
VAH_data_fill1['weekday'].apply(day_differentiate)

#Add column of logged day types
dayType = {'dayType': pd.Series(dayLog)}
dayType = pd.DataFrame(dayType)
dayType = dayType.set_index(VAH_data_fill1.index)
VAH_data_fill1 = pd.concat([VAH_data_fill1, dayType],axis=1)

VAH_data_fill2 =\
    VAH_data_fill1.groupby([lambda x: x.month, 'dayType', 
                            lambda x: x.hour],            
                           group_keys=False).apply(fill_mean)

干杯, 克里斯

0 个答案:

没有答案