Question

我正在研究R中的hflights数据集，试图提取一些有用的见解。我设法在每个季节的周末都访问了最多的目的地。当我想为自己的洞察力作图时，我只想获得每个季节的前5个目的地。

我在下面尝试了此代码，但没有获得每个季节的前5个目的地。有人可以帮忙解决此问题吗？汇总后处理变量的最佳方法是什么（在我们的案例中是航班）？

allseasons <- hflights %>%
  filter(DayOfWeek == c(6, 7)) %>%
  mutate(Season = case_when(
    Month %in% 3:5 ~ "Spring",
    Month %in% 9:11 ~ "Autumn", 
    Month %in% 6:8 ~ "Summer", 
    Month %in% 12:2 ~ "Winter")) %>%
  filter(!is.na(Season)) %>%
  group_by(Dest, Season) %>%
  summarise(flights = n()) %>%
  arrange(desc(flights)) %>%
  arrange(desc(Season)) %>%
  top_n(5, flights)

结果：

Dest  Season flights
   <chr> <chr>    <int>
 1 DAL   Winter     166
 2 DFW   Winter     149
 3 ATL   Winter     146
 4 DEN   Winter     133
 5 MSY   Winter     124
 6 ORD   Winter     118
 7 LAX   Winter     114
 8 PHX   Winter     107
 9 EWR   Winter     102
10 CLT   Winter      92
# ... with 428 more rows

Answer 1

这是基于虹膜的类似示例（给出了已定义组的某些统计信息的前5名）：

// Forest, Grass etc. are integer constants.

if(biome == Forest)
{
    if(element == Grass)
    {
        print("There is grass in the forest.");
    }
    else if(element == Water)
    {
        print("There is water in the forest.");
    }
    else
    {
        print("Given element is invalid for this biome.");
    }
}
else if(biome == Desert)
{
    if(element == Sand)
    {
        print("There is sand in the desert.");
    }
    else
    {
        print("Given element is invalid for this biome.");
    }
}
else
{
    print("Given biome is invalid.");
}

Answer 2

@Chris和@alistaire在其评论中已经指出了一些重要步骤。此外，假设一周中的某几天以星期日开始（不过不确定，您可以对该部分进行调整）：

Month %in% c(12, 1, 2)而非Month %in% 12:2

发表评论，

您将获得7个目的地，因为您有7个唯一的目的地。请尝试更改可视化方法以实现所需的效果。刻面可能会有所帮助，如下所示：

library(hflights)
library(dplyr)
library(ggplot2)

allseasons <- hflights %>%
  filter(DayOfWeek %in% 6:7) %>%
  mutate(
    Season = case_when(
      Month %in% 3:5 ~ "Spring",
      Month %in% 9:11 ~ "Autumn", 
      Month %in% 6:8 ~ "Summer", 
      Month %in% c(12, 1, 2) ~ "Winter"
    )
  ) %>%
  group_by(Season, Dest) %>%
  summarise(flights = n()) %>%
  arrange(desc(flights)) %>%
  slice(1:5)

ggplot(allseasons, aes(x = Dest, y = flights, fill = Season)) + 
  geom_bar(stat = "identity")

ggplot(allseasons, aes(x = Dest, y = flights)) + 
  geom_bar(stat = "identity") +
  facet_wrap(~ Season)

Answer 3

在要专门使用top_n的情况下，它只是已经提供的答案的一种替代方法-由于它处理领带的方式，您可能想要这样做。

 hflights %>%
  filter(DayOfWeek %in% c(6, 7)) %>%
  mutate(Season = case_when(
    Month %in% 3:5 ~ "Spring",
    Month %in% 9:11 ~ "Autumn", 
    Month %in% 6:8 ~ "Summer", 
    Month %in% c(12, 1, 2) ~ "Winter")) %>%
  filter(!is.na(Season)) %>%
  group_by(Season, Dest) %>%
  summarise(flights = n()) %>%
  top_n(5, flights) %>%
  arrange(Season, desc(flights))

正如我在评论中指出的那样，您的主要问题是group_by(Dest,Season)。 summarise()删除了分组的最后一层，因此将您的数据按目标而不是按季节分组。

您使用arrange()进行的排序是多余的，应在使用top_n之后进行。

正如其他人指出的那样，在将一个值与多个值而不是%in%比较时，也应该使用==。

从r中的每个组中选择前五个值

3 个答案: