我有2个表(a
和b
),每个表有365条记录(1年数据)。我希望在表a
中得到每月的平均值,如果它低于0.01,则删除属于该月的所有每日值并输出新表。此外,我希望从表b
中删除相应的每日值,以便为其生成新表。
例如:如果1月和4月的月平均值小于0.01,则输出表a
和b
,每个值为304。 dput(head(a))
和dput(head(b))
的输出分别为:
structure(list(V1 = c(0, 0, 0, 0.43, 0.24, 0)), .Names = "V1", row.names = c(NA, 6L), class = "data.frame")
structure(list(V1 = c(0.042022234, 0.014848409, 0.275174289, 0.485364883, 0.177960815, 0.006799459)), .Names = "V1", row.names = c(NA, 6L), class = "data.frame")
我不知道如何在R中使用列表理解。任何建议都会很有意义。
答案 0 :(得分:1)
如果table
表示data.frame和数据结构类似于@ eclark的示例数据,您可以使用dplyr
尝试这样的事情。
数据强>
set.seed(123)
a <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),
length.out = 365), value=rnorm(n = 365,mean = .01,sd = .1))
b <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),
length.out = 365), value=rnorm(n = 365,mean = .01,sd = .15))
<强> CODE 强>
library(dplyr)
# Create a column with month
mutate(a, month = as.character(format(Date, "%m"))) -> a
mutate(b, month = as.character(format(Date, "%m"))) -> b
# Get mean for each month and get months with average lower than 0.01 in the data frame, a
summarise(group_by(a, month), average = mean(value)) %>%
filter(average < 0.01) -> wutever
#wutever
#Source: local data frame [5 x 2]
#
# month average
#1 01 0.0068172630
#2 04 0.0006111069
#3 05 -0.0052247522
#4 08 0.0008155293
#5 12 0.0054872409
# Remove data points including months in wutever from a and b
filter(a, !month %in% wutever$month) -> newA
filter(b, !month %in% wutever$month) -> newB
答案 1 :(得分:0)
不是最优雅或最快捷的方式,但这是一个想法:
a <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),length.out = 365),a=rnorm(n = 365,mean = .01,sd = .1))
b <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),length.out = 365),b=rnorm(n = 365,mean = .01,sd = .15))
require(dplyr)
c <- merge(a,b,by=1)
c <- tbl_df(data = c)
c <- mutate(c, month=substr(c$Date,6,7))
d <- summarise(group_by(c, month),am = mean(a),bm=mean(b))
c <- left_join(c,d)
c <- filter(c, c$am>=.01 & c$bm>= .01)
a <- c[,c(2,3)]
b <- c[,c(2,4)]
remove(c,d)
答案 2 :(得分:0)
单独使用基本功能,并假设您的两个数据框包含变量day
,month
和value
:
> new_a <- do.call(rbind, by(a, a$month, function(df) {
ifelse(mean(df$value) < 0.01, NULL, df)
}))
> new_b <- subset(b, day %in% new_a$day)
或者,您可以使用plyr
包并尝试:
> new_a <- ddply(a, .(month), function(df) ifelse(mean(df$value) < 0.01, NULL, df))
> new_b <- subset(b, day %in% new_a$day)