为什么同一类别在R中给出不同的频率

时间:2017-09-20 14:01:09

标签: r dplyr

Process_Table = Process_Table[order(-Process_Table$Process, -Process_Table$Freq),]

#output
                             Process Freq Percent
17            Other Airport Services   45   15.46
5                           Check-in   35   12.03
23 Ticket sales and support channels   35   12.03
11               Flight and inflight   33   11.34
19                      Pegasus Plus   23    7.90
24                       Time Delays   16    5.50
7                              Other   13    4.47
14                             Other   13    4.47
22                             Other   13    4.47
25                             Other   13    4.47
16                             Other   11    3.78
20                             Other    6    2.06
26                             Other    6    2.06
3                              Other    5    1.72
13                             Other    5    1.72
18                             Other    5    1.72
21                             Other    4    1.37
1                              Other    2    0.69
2                              Other    1    0.34
4                              Other    1    0.34
6                              Other    1    0.34
8                              Other    1    0.34
9                              Other    1    0.34
10                             Other    1    0.34
12                             Other    1    0.34
15                             Other    1    0.34

正如您所看到的那样,它为同一级别提供不同的频率 然而,如果我在该功能中打印级别,它将提供输出,如下所示

levels(Process_Table$Process)

[1] "Check-in"                          "Flight and inflight"              
[3] "Other"                             "Other Airport Services"           
[5] "Pegasus Plus"                      "Ticket sales and support channels"
[7] "Time Delays"             

我想要的是"其他"的组合频率。类别。任何人都可以帮我解决这个问题。

编辑代码用于派生到第一组输出:

Process_Table$Percent = round(Process_Table$Freq/sum(Process_Table$Freq) * 100, 2)

Process_Table$Process = as.character(Process_Table$Process)
low_list = Process_Table %>%
  filter(Percent < 5.50) %>%
  select(Process)

Process_Table$Process = ifelse(Process_Table$Process %in% low_list$Process, 'Other', Process_Table$Process)

as.data.frame(Process_Table)

Process_Table$Process = as.factor(Process_Table$Process)

1 个答案:

答案 0 :(得分:0)

您的Processed_Table应该进行另一个聚合步骤。将以下内容添加到数据聚合的最后一步。

    Processed_Table <- Processed_Table %>% group_by(Process) %>% summarize(Freq = sum(Freq), Percent = sum(Percent))