R:优化在数据帧上查找函数的最大值,然后修剪其余

时间:2016-04-22 21:54:48

标签: r

首先,我的数据来自Temperature.xls,可以从以下链接下载:RBook

我的代码是:

temp = read.table("Temperature.txt", header = TRUE)
length(unique(temp$Year)) # number of unique values in the Year vector.
res = ddply(temp, c("Year","Month"), summarise, Mean = mean(Temperature, na.rm = TRUE))
res1 = ddply(temp, .(Year,Month), summarise,
    SD = sd(Temperature, na.rm = TRUE),
    N = sum(!is.na(Temperature))
         )
# ordering res1 by sd and year:
res1 = res1[order(res1$Year,res1$SD),];
# finding maximum of SD in res1 by year and displaying just them in a separate data frame
res1_maxsd = ddply(res1, .(Year), summarise, MaxSD = max(SD, na.rm = TRUE)) # find the maxSD in each Year
res1_max = merge(res1_maxsd,res1, all = FALSE) # merge it with the original to see other variables at the max's rows
res1_m = res1_max[res1_max$MaxSD==res1_max$SD,] # find which rows are the ones corresponding to the max value
res1_mm = res1_m[complete.cases(res1_m),] # trim all others (which are NA's)

我知道我可以将最后4行剪切为更少的线条。我能以某种方式在一个命令中执行最后两行吗?我偶然发现了:

res1_m = res1_max[complete.cases(res1_max$MaxSD==res1_max$SD),]

但这并没有给我我想要的东西,最终只是一个较小的数据框,只有包含maxSD的行(包含所有变量)。

2 个答案:

答案 0 :(得分:0)

而不是修复最后2行,为什么不从res1开始?颠倒SD的顺序并每年取第一行为您提供等效的最终数据集......

res1 <- res1[order(res1$Year,-res1$SD),]
res_final <- res1[!duplicated(res1$Year),]

答案 1 :(得分:0)

如果使用dplyr包,可以减少最后四行。由于您希望保留原始数据集中的某些信息,因此您可能不想使用汇总,因为它只返回汇总信息,您必须与原始数据集合并,因此mutatefilter将是一个更好的选择:

library(dplyr)
res1_mm1 <- res1 %>% group_by(Year) %>% filter(SD == max(SD, na.rm = T))

您还可以使用mutate函数创建新列MaxSD,该列与您案例的结果数据框中的SD相同:

res1_mm1 <- res1 %>% group_by(Year) %>% mutate(MaxSD = max(SD, na.rm = T)) %>% 
            filter(SD == MaxSD)