感谢任何可以提供帮助的人:
我有一个半小时的年度时间序列。整个NaN值都散落着。我要做的是首先用同一个月内同一天的平均值填写NaN值。
这是我到目前为止所做的:
def fill_mean(VAH_data):
# function which replaces NaN values with the mean of a particular grouping
return VAH_data.fillna(VAH_data.mean())
VAH_data_filled =\
VAH_data_rs.groupby([lambda x: x.month, lambda x: x.weekday(), lambda x: x.hour],
group_keys=False).apply(fill_mean)
` 这填补了大多数,但仍然存在大小不等的差距。我想在分组中实现weekday()函数,以便在工作日和周末之间进行概括。
我找到了以下帖子: in pandas how can I groupby weekday() for a datetime column?
但不确定如何在分组内实现它。
解
我找到了让人们看的一种方式:
#Create copy of index column to be used to group day types
VAH_data_fill1['date_temp'] = VAH_data_fill1.index
#Create separate column that indicates specific day type
VAH_data_fill1['weekday'] =\
VAH_data_fill1['date_temp'].apply(lambda x: x.weekday())
#Create a function to differentiate between weekdays and weekends
#Days are defined as: Monday = 0 to Sunday = 6
dayLog = []
def day_differentiate(VAH_data2):
if VAH_data2 < 5:
dayLog.append(1)
else:
dayLog.append(0)
#Apply differentiate function to sort weekdays and weekends
VAH_data_fill1['weekday'].apply(day_differentiate)
#Add column of logged day types
dayType = {'dayType': pd.Series(dayLog)}
dayType = pd.DataFrame(dayType)
dayType = dayType.set_index(VAH_data_fill1.index)
VAH_data_fill1 = pd.concat([VAH_data_fill1, dayType],axis=1)
VAH_data_fill2 =\
VAH_data_fill1.groupby([lambda x: x.month, 'dayType',
lambda x: x.hour],
group_keys=False).apply(fill_mean)
干杯, 克里斯