如何根据R中的多个列过滤数据框

时间:2015-11-02 19:30:21

标签: r dplyr

我有一个这样的数据框:

Server Avg_Cpu 95th 
Server01  40    90
Server01  45    90
Server02  56    80
Server02  50    80

我需要对此数据框进行子集化并选择具有最高95和Avg_Cpu的唯一服务器

我最后的df会是这样的:

Server Avg_Cpu 95th 
Server01  45    90
Server02  56    80

我尝试了dplyr包,如下所示:

df %>% group_by(Server) %>% filter(Avg_Cpu==max(Avg_Cpu))

不太正常,得到:

Error: filter condition does not evaluate to a logical vector.

1 个答案:

答案 0 :(得分:1)

尝试使用dput(df)str(df)检查data.framedf的结构,因为这对我有用:

df <- read.table(textConnection("Server Avg_Cpu 95th 
                                 Server01  40    90
                                 Server01  45    90
                                 Server02  56    80
                                 Server02  50    80"), header = T)

library(dplyr)

df %>%
  group_by(Server) %>%
  filter(Avg_Cpu == max(Avg_Cpu),
           X95th == max(X95th))

# Source: local data frame [2 x 3]
# Groups: Server [2]
# 
#     Server Avg_Cpu X95th
#     (fctr)   (int) (int)
# 1 Server01      45    90
# 2 Server02      56    80

注意我的情况`str(df)返回:

# > str(df)
# 'data.frame': 4 obs. of  3 variables:
#   $ Server : Factor w/ 2 levels "Server01","Server02": 1 1 2 2
# $ Avg_Cpu: int  40 45 56 50
# $ X95th  : int  90 90 80 80