所有数据帧变量的每日平均值,包括具有聚合函数的NA值

时间:2015-12-01 17:28:11

标签: r dataframe timestamp aggregate na

我想计算我的数据框中包含NA值的所有变量的每日平均值。我的所有数据库每30分钟都有一个值,所以我非常有兴趣使用带有聚合函数的时间戳来获取每日,每周,每月...聚合数据。 我的数据帧是37795行x 54个变量。我尝试了两种方法来做到这一点,第一种选择不会给我每日手段,因为我获得了太高的值(不合逻辑)。第二个选项给了我几乎所有的NA值。我不该做什么。

我在下面编写了我的数据框头和代码。

system("cp -r olddir newdir");

我怎么能这样做?

谢谢!

1 个答案:

答案 0 :(得分:0)

我没有使用aggregate功能,我使用了tapply

这是处理NA's的代码,我想出了:

# create a sequence of DateTime with half-hourly data
DateTime <- seq.POSIXt(from = as.POSIXct("2015-05-01 00:00:00", tz = "Etc/GMT+12"), 
                       to = as.POSIXct("2015-05-30 23:59:00", tz = "Etc/GMT+12"), by = 1800)

# create some dummy data of the same length as DateTime vector
aa <- runif(1440, 5.0, 7.5) 
bb <- NA
df <- data.frame(DateTime, aa, bb)

# replace a cell with NA in the "a" column
df[19,2] <- NA # dataframe = df, row = 19, column = 2

# create DateHour column to use later
df$DateHour <- paste(format(df$DateTime, "%Y/%m/%d"), format(df$DateTime, "%H"), sep = " ")
View(df)

# Hourly means
# Calculate hourly mean values
aa.HourlyMean <- tapply(df$aa, df$DateHour, mean, na.rm = TRUE)
# convert the vector to dataframe
aa.HourlyMean <- data.frame(aa.HourlyMean) 

# Extract the DateHour column from the "aa" dataframe
aa.HourlyMean$DateHour <- row.names(aa.HourlyMean); 
# Delete rownames of "aa" dataframe
row.names(aa.HourlyMean) <- NULL

# Create a tidy DateTime column
aa.HourlyMean$DateTime <- as.POSIXct(aa.HourlyMean$DateHour, "%Y/%m/%d %H", tz = "Etc/GMT+12")

# change to a tidy dataframe
aa.HourlyMean <- aa.HourlyMean[,c(3,2,1)]

# You can delete any column (for example, DateHour) by
# aa.HourlyMean$Date <- NULL

# You can rename a column with "plyr" package by
# rename(aa.HourlyMean)[3] <- "NewColumnName"

# View the hourly mean of the "aa" dataframe
View(aa.HourlyMean)

# You can do the same with the "bb" vector
bb.HourlyMean <- tapply(df$bb, df$DateHour, mean, na.rm = TRUE)
bb.HourlyMean <- data.frame(bb.HourlyMean)

# View the hourly mean of the "bb" vector
View(bb.HourlyMean) 

# /Hourly means

然后,您可以在一个数据框中合并aa.HourlyMeanbb.HourlyMean向量。

# Daily means
df$Date <- format(df$DateTime, "%Y/%m/%d")
aa.DailyMean <- tapply(df$aa, df$Days, mean, na.rm = TRUE)
aa.DailyMean <- data.frame(aa.DailyMean)
aa.DailyMean$Date <- row.names(aa.DailyMean); row.names(aa.DailyMean) <- NULL
aa.DailyMean <- aa.DailyMean[,c(2,1)]

View(aa.DailyMean)
# /Daily means

# Weekly means
df$YearWeek <- paste(format(df$DateTime, "%Y"), strftime(DateTime, format = "%W"), sep = " ")
aa.WeeklyMean <- tapply(df$aa, df$YearWeek, mean, na.rm = TRUE)
aa.WeeklyMean <- data.frame(aa.WeeklyMean)
aa.WeeklyMean$YearWeek <- row.names(aa.WeeklyMean); row.names(aa.WeeklyMean) <- NULL
aa.WeeklyMean <- aa.WeeklyMean[,c(2,1)]

View(aa.WeeklyMean)
# /Weekly means

我创建了每小时,每日和每周观察的平均值,但您可以了解如何创建每月,每年,......。