Question

我有2个表（a和b），每个表有365条记录（1年数据）。我希望在表a中得到每月的平均值，如果它低于0.01，则删除属于该月的所有每日值并输出新表。此外，我希望从表b中删除相应的每日值，以便为其生成新表。

例如：如果1月和4月的月平均值小于0.01，则输出表a和b，每个值为304。 dput(head(a))和dput(head(b))的输出分别为：

structure(list(V1 = c(0, 0, 0, 0.43, 0.24, 0)), .Names = "V1", row.names = c(NA, 6L), class = "data.frame")

structure(list(V1 = c(0.042022234, 0.014848409, 0.275174289, 0.485364883, 0.177960815, 0.006799459)), .Names = "V1", row.names = c(NA, 6L), class = "data.frame")

我不知道如何在R中使用列表理解。任何建议都会很有意义。

Answer 1

如果table表示data.frame和数据结构类似于@ eclark的示例数据，您可以使用dplyr尝试这样的事情。

数据

set.seed(123) a <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"), length.out = 365), value=rnorm(n = 365,mean = .01,sd = .1)) b <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"), length.out = 365), value=rnorm(n = 365,mean = .01,sd = .15))

<强> CODE

library(dplyr) # Create a column with month mutate(a, month = as.character(format(Date, "%m"))) -> a mutate(b, month = as.character(format(Date, "%m"))) -> b # Get mean for each month and get months with average lower than 0.01 in the data frame, a summarise(group_by(a, month), average = mean(value)) %>% filter(average < 0.01) -> wutever #wutever #Source: local data frame [5 x 2] # # month average #1 01 0.0068172630 #2 04 0.0006111069 #3 05 -0.0052247522 #4 08 0.0008155293 #5 12 0.0054872409 # Remove data points including months in wutever from a and b filter(a, !month %in% wutever$month) -> newA filter(b, !month %in% wutever$month) -> newB

Answer 2

不是最优雅或最快捷的方式，但这是一个想法：

a <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),length.out = 365),a=rnorm(n = 365,mean = .01,sd = .1)) 
b <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),length.out = 365),b=rnorm(n = 365,mean = .01,sd = .15)) 
require(dplyr)
c <- merge(a,b,by=1)
c <- tbl_df(data = c)
c <- mutate(c, month=substr(c$Date,6,7))
d <- summarise(group_by(c, month),am = mean(a),bm=mean(b))
c <- left_join(c,d)
c <- filter(c, c$am>=.01 & c$bm>= .01)
a <- c[,c(2,3)]
b <- c[,c(2,4)]
remove(c,d)

Answer 3

单独使用基本功能，并假设您的两个数据框包含变量day，month和value：

> new_a <- do.call(rbind, by(a, a$month, function(df) {
      ifelse(mean(df$value) < 0.01, NULL, df)
  }))
> new_b <- subset(b, day %in% new_a$day)

或者，您可以使用plyr包并尝试：

> new_a <- ddply(a, .(month), function(df) ifelse(mean(df$value) < 0.01, NULL, df))
> new_b <- subset(b, day %in% new_a$day)

使用R从平均值大于阈值的表中提取月度数据

3 个答案: