实现dplyr的filter()以显示多年中的多个条目

时间:2018-07-14 20:36:46

标签: r dplyr

我有以下示例数据集:

df <- tibble(
  "PLAYER" = c("Corey Kluber", "CLayton Kershaw", "Max Scherzer", "Chris Sale",
           "Corey Kluber", "Jake Arrieta", "Jose Urena", "Yu Darvish"),
  "YEAR" = c(2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017),
  "WHIP" = c(1.24, 1.50, 1.70, 1.35, 1.42, 1.33, 1.61, 1.10)
 )

真实数据集从2000年到2017年。我如何使用filter()(或通常为dplyr)来培养所有在多个赛季中都活跃的球员?例如,在上面的示例中,Corey Kluber将于2016年和2017年推出。我如何使用dplyr软件包来抚养他?我想这是这样的:

df %>%
  select(PLAYER, YEAR, WHIP) %>%  #MY SET HAS MORE VARIABLES THAN THE SAMPLE SHOWS
  filter(PLAYER %in% YEAR == c(2016,2017))

当我希望<0 rows> (or 0-length row.names)出现两次时,这仅返回Corey Kluber。谢谢。

1 个答案:

答案 0 :(得分:2)

一个人可以使用dplyr::n_distinct来找出一个玩家出现在不同的session/year中的次数。您必须对PLAYER进行分组,然后使用条件n_distinct(YEAR)>1过滤数据:

library(tidyverse)

df %>% group_by(PLAYER) %>%
  filter(n_distinct(YEAR) > 1) # A player has appeared in many sessions

# # A tibble: 2 x 3
# # Groups: PLAYER [1]
#   PLAYER        YEAR  WHIP
#   <chr>        <dbl> <dbl>
# 1 Corey Kluber  2016  1.24
# 2 Corey Kluber  2017  1.42
# 

如果OP有兴趣仅知道这些玩家的名字,那么

df %>% group_by(PLAYER) %>%
  filter(n_distinct(YEAR) > 1) %>%
  select(PLAYER) %>%
  distinct()
# # A tibble: 1 x 1
# # Groups: PLAYER [1]
# PLAYER      
# <chr>       
# 1 Corey Kluber