Question

我有来自日期和值的数据。我试图使用dplyr和订单或多个汇总语句获得每年第四高的值。我想要发生第四个最高值的日期以及所有年份的数据框中的值。

这是我的剧本：

    timeozone <- import(i, date="DATES", date.format = "%Y-%m-%d %H", header=TRUE, na.strings="NA")
    colnames(timeozone) <- c("column","date", "O3")
    timeozone %>%
      mutate(month = format(date, "%m"), day = format(date, "%d"), year = format(date, "%Y")) %>%
      group_by(month, day, year) %>%
      summarise(fourth = O3[order(O3, decreasing = TRUE)[4] ])

我不确定我上面的内容有什么问题。任何帮助将不胜感激。

数据：

日期值

11/12/2000 14

11/13/2000 16

11/14/2000 17

11/15/2000 21

11/13/2001 31

11/14/2001 21

11/15/2001 62

11/16/2001 14

Answer 1

由于您未提供可重现的数据，因此以下是使用iris的示例。您需要按年份而不是Species进行分组，但适用相同的想法。

如果您不与dplyr结婚，可以使用aggregate直接相对地执行此操作：

iris %>%
  group_by(Species) %>%
  summarise(fourth = Petal.Length[order(Petal.Length, decreasing = TRUE)[4] ])

给出：

     Species fourth
1     setosa    1.7
2 versicolor    4.9
3  virginica    6.6

您可以使用以下方式确认值是否正确：

by(iris$Petal.Length, iris$Species, sort)

使用nth，遵循@tchakravarty的建议：

iris %>%
  group_by(Species) %>%
  summarise(fourth = nth(sort(Petal.Length), -4L))

给出与上面相同的值。

Answer 2

使用base（并再次使用iris数据）的另一个选项是按组拆分变量，然后对其进行排序并提取第四个元素。例如

data(iris)
petals <- split(iris$Petal.Length, iris$Species)
sapply(petals, function(x) x[order(x)][4])

或实际上，tapply

更简洁

tapply(iris$Petal.Length, iris$Species, function(x) x[order(x)][4])

修改

使用上面的示例数据，您可以提取整行（或者只是日期，如果需要），如下所示。

date <- c("11/12/00", "11/13/00", "11/14/00", "11/15/00", "11/13/01", 
"11/14/01", "11/15/01", "11/16/01")

value <- c(14, 16, 17, 21, 31, 21, 62, 14)

date_splt <- strsplit(date, "/")
year <- sapply(date_splt, "[", 3)

d <- data.frame(date, value, year)

d_splt <- split(d, d$year)
lapply(d_splt, function(x) x[order(x$value), ][4, ])

使用聚合来获得R中每年第四高的值

2 个答案:

修改