使用R从平均值大于阈值的表中提取月度数据

时间:2014-11-26 01:22:36

标签: r list

我有2个表(ab),每个表有365条记录(1年数据)。我希望在表a中得到每月的平均值,如果它低于0.01,则删除属于该月的所有每日值并输出新表。此外,我希望从表b中删除相应的每日值,以便为其生成新表。

例如:如果1月和4月的月平均值小于0.01,则输出表ab,每个值为304。 dput(head(a))dput(head(b))的输出分别为:

structure(list(V1 = c(0, 0, 0, 0.43, 0.24, 0)), .Names = "V1", row.names = c(NA, 6L), class = "data.frame")

structure(list(V1 = c(0.042022234, 0.014848409, 0.275174289, 0.485364883, 0.177960815, 0.006799459)), .Names = "V1", row.names = c(NA, 6L), class = "data.frame")

我不知道如何在R中使用列表理解。任何建议都会很有意义。

3 个答案:

答案 0 :(得分:1)

如果table表示data.frame和数据结构类似于@ eclark的示例数据,您可以使用dplyr尝试这样的事情。

数据

set.seed(123)
a <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),
                length.out = 365), value=rnorm(n = 365,mean = .01,sd = .1))

b <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),
                length.out = 365), value=rnorm(n = 365,mean = .01,sd = .15))

<强> CODE

library(dplyr)

# Create a column with month
mutate(a, month = as.character(format(Date, "%m"))) -> a
mutate(b, month = as.character(format(Date, "%m"))) -> b

# Get mean for each month and get months with average lower than 0.01 in the data frame, a
summarise(group_by(a, month), average = mean(value)) %>%
filter(average < 0.01) -> wutever

#wutever
#Source: local data frame [5 x 2]
#
#  month       average
#1    01  0.0068172630
#2    04  0.0006111069
#3    05 -0.0052247522
#4    08  0.0008155293
#5    12  0.0054872409

# Remove data points including months in wutever from a and b
filter(a, !month %in% wutever$month) -> newA
filter(b, !month %in% wutever$month) -> newB   

答案 1 :(得分:0)

不是最优雅或最快捷的方式,但这是一个想法:

a <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),length.out = 365),a=rnorm(n = 365,mean = .01,sd = .1)) 
b <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),length.out = 365),b=rnorm(n = 365,mean = .01,sd = .15)) 
require(dplyr)
c <- merge(a,b,by=1)
c <- tbl_df(data = c)
c <- mutate(c, month=substr(c$Date,6,7))
d <- summarise(group_by(c, month),am = mean(a),bm=mean(b))
c <- left_join(c,d)
c <- filter(c, c$am>=.01 & c$bm>= .01)
a <- c[,c(2,3)]
b <- c[,c(2,4)]
remove(c,d)

答案 2 :(得分:0)

单独使用基本功能,并假设您的两个数据框包含变量daymonthvalue

> new_a <- do.call(rbind, by(a, a$month, function(df) {
      ifelse(mean(df$value) < 0.01, NULL, df)
  }))
> new_b <- subset(b, day %in% new_a$day)

或者,您可以使用plyr包并尝试:

> new_a <- ddply(a, .(month), function(df) ifelse(mean(df$value) < 0.01, NULL, df))
> new_b <- subset(b, day %in% new_a$day)