您好我的数据框如下
在下面的df中,我们如何在“输出”列中替换/找到NA,它在同一时间给出了过去7天的平均值。例如:如果2014-02-08 00:45的值为NA,那么我们需要用之前的7个平均值替换,即从(feb 1到feb 7)中的值的平均值(00:45)
dates = c('21-01-2014 00:15', '21-01-2014 00:30','21-01-2014 00:45','22-01-2014 00:00','22-01-2014 00:30','22-01-2014 00:45','23-01-2014 00:00','23-01-2014 00:15','23-01-2014 00:45','25-01-2014 00:45','26-01-2014 00:45','26-01-2014 00:46','26-01-2014 00:30','27-02-2014 00:45','28-02-2014 00:45','29-03-2014 00:45','30-03-2014 00:00','30-03-2014 00:45','30-03-2014 00:45','31-03-2014 00:45','01-04-2014 00:45','02-04-2014 00:45','03-04-2014 00:45')
value = c(20, 5, 10, 23, NA, 22, 12, 10, NA, 12, NA, 4, 19, 12,
NA, NA, 2, 2, NA, 14, NA, 21, NA)
output =c(20, 5, 10, 23, 5, 22, 12, 10, 10, 12, 11, 4, 19, 12,
14, 14, 2, 2, 11.6, 14, 12, 21, 13.28)
df=data.frame(dates, value,output)
df$dates = as.POSIXct(strptime(df$dates, format = "%d-%m-%Y %H:%M","GMT"))
提前致谢..
答案 0 :(得分:0)
您可以遍历行。
XElement element = GetElement(doc,"Band");
我使用library(data.table)
library(dplyr)
df <- df %>% as.data.table()
for(index in 1:nrow(df)){ # index <- 23
print(index)
if(df[index, value] %>% is.na()){
if(index >= 7){
df[index, value := df[(index - 7):(index-1), value] %>% mean()]
}else
{
df[index, value:=df[1:index-1, value] %>% mean()]
}
}
}
因为我对此更熟悉。我想如果您想在处理后继续使用data.table
。
告诉我这是否是你想要的
答案 1 :(得分:0)
如果两行匹配,那么我会尝试在两行匹配的条件下加入数据框,如果它们是您要查找平均值的行组的一部分。
library(data.table)
dt <- data.table(df)
dt[ , c("id", "dates_tmp1", "dates_tmp2", "dates_7", "time")
:= list(1:nrow(dt), dates, dates, dates - as.difftime(7, unit="days"), strftime(dates, format="%H:%M:%S"))]
为联接创建了一些临时列,以便不破坏旧数据。
joined <- dt[dt, on=.(dates_tmp1>=dates_tmp1, dates_7<=dates_tmp2, time==time), allow=TRUE]
mean_values <- joined[ , list(mean_value=mean(i.value, na.rm = TRUE)), by = "id"]
mean_values <- mean_values[order(id)]
id mean_value
1: 1 20.00000
2: 2 5.00000
3: 3 10.00000
4: 4 23.00000
5: 5 5.00000
6: 6 16.00000
取这些值来代替NA。
如果您想要在过去7天内发生,那么您可以创建一个新列,列出日期,然后再进行相同的操作。
dt[ , c("id", "time"):= list(1:nrow(dt),strftime(dates, format="%H:%M:%S"))]
dt[ , days := as.numeric(frank(as.Date(dates), ties.method = "dense")), by = time]
dt[ , days_7:=days - 7]
joined <- dt[dt, on=.(days>=days, days_7<=days, time==time), allow=TRUE]
mean_values <- joined[ , list(mean_value=mean(i.value, na.rm = TRUE)), by = "id"]