在散点图中包括一些分类变量案例,而排除其他案例

时间:2019-07-12 17:10:48

标签: r ggplot2

我正在尝试通过与RStudio一起玩并使用NBA最新赛季的数据制作图表来教自己一些R。在某些图表中有一些重复的播放器数据,我想包括其中一些,而我想排除其中一些。

我的数据集来自https://www.basketball-reference.com/leagues/NBA_2019_per_game.html(我不知道如何直接链接到CSV数据,但是可以在“共享与更多”菜单项下找到该数据集)。将统计数据下载到文件后,我将其导入RStudio ...

> stats <- read.csv("~/Downloads/2018-2019 NBA per game stats.txt")

我做了一个散点图样本...

> ggplot(stats, aes(x=MP,y=FGA)) +geom_point() 

但是我注意到,对于球员来说,有很多点是重复的,因为他们在一年中被交易并效力于多支球队。例如,有Ryan Anderson和Trevor Ariza ...

Player                    Tm     MP     FGA
Ryan Anderson\anderry01   TOT    322    69
Ryan Anderson\anderry01   PHO    278    60
Ryan Anderson\anderry01   MIA    44     9
OG Anunoby\anunoog01      TOR    1352   404
Trevor Ariza\arizatr01    TOT    2349   736
Trevor Ariza\arizatr01    PHO    884    227
Trevor Ariza\arizatr01    WAS    1465   509

我如何创建一个散点图,其中包括只参加过1个球队(如OG Anunoby)或球员的全年统计数据(Ryan Anderson和Trevor Ariza的TOT线)的球员,但不包括部分赛季的球员(Ryan Anderson和Trevor Ariza的PHO,MIA和WAS系列)?

我想有一种使用某些ifelse语句的方法 创建一个虚拟变量,或将该信息传递到ggplotgeom_point上,但我在网上很难找到类似的例子。

3 个答案:

答案 0 :(得分:1)

考虑根据ave(根据需要的图)添加具有ifelse(内联计数聚合)和subset(条件逻辑)的指标列以# NEW COLUMNS stats$team_count <- with(stats, ave(MP, Player, FUN=length)) stats$tot_indicator <- with(stats, ifelse(team_count == 1, 'TOT', Tm)) # SUBSETTED DATA SCATTERPLOT (ONE TEAM PLAYERS) ggplot(subset(stats, team_count == 1), aes(x=MP, y=FGA)) + geom_point() # SUBSETTED DATA SCATTERPLOT (ALL PLAYERS' TOT) ggplot(subset(stats, tot_indicator == 'TOT'), aes(x=MP, y=FGA)) + geom_point() 主要数据:

.hero {
  position: relative;
  background: url("images/laptop.png") no-repeat bottom fixed;
  -webkit-background-size: cover;
  -moz-background-size: cover;
  background-size: cover;
  text-align: center;
  color: #fff;
  padding-top: 110px;
  min-height: 500px;
  letter-spacing: 2px;
  font-family: "Montserrat", sans-serif;
}

答案 1 :(得分:1)

1)要创建一个散点图,其中包括只为1个团队(如OG Anunoby)出战的球员:

library(tidyverse)

# first, identify which players play for more than 1 team. 

single_team_players <- stats %>%
select(Player) %>%
group_by(Player) %>%
# counts how many teams a player has played for
summarise(count = n()) %>%
# keep only players that have played for 1 team
filter(count == 1)

# then filter out these players from stats
stats_single_team_players <- stats %>%
filter(Player %in% single_team_players$Player)

# create scatterplot
ggplot(stats_single_team_players, aes(x=MP,y=FGA))+
  geom_point()+
  labs(title = "Single Team Players")

2)创建一个散点图,用于统计球员的全年统计数据(Ryan Anderson和Trevor Ariza的TOT线),而不是部分赛季(Ryan Anderson和Trevor Ariza的PHO,MIA和WAS线)

# filter for single team players OR team = TOT
total_year_stats <- stats %>%
  filter((Player %in% single_team_players$Player)|
           (Tm == "TOT"))

# graph scatterplot
ggplot(total_year_stats, aes(x=MP,y=FGA)) +
  geom_point()+
  labs(title = "Total Year Stats")

答案 2 :(得分:0)

使用* { box-sizing: border-box; margin: 0; padding: 0; } header{ display: flex; justify-content: flex-start; align-items: center; padding: 10px 10px; } nav{ font-family: "Montserrat", sans-serif; font-weight: 500; font-size: 16px; margin-left: 15px; display: flex; width: 100%; } .nav__links_R, .nav__links_L, a { text-decoration: none; list-style: none; float: left; color:rgba(0, 0, 0, .50); } .nav__links_R { margin-left: auto; } .nav__links_L li { display: inline-block; padding: 0px 20px; } .nav__links_R li { display: inline-block; padding: 0px 20px; }删除“ TOT”,然后使用 <header> <img src="http://placekitten.com/200/40" alt="logo"> <nav> <ul class=nav__links_L> <li><a href="#">Home</a></li> <li><a href="#">About</a></li> <li><a href="#">Contact</a></li> </ul> <ul class=nav__links_R> <li><a href="">Register</a></li> <li><a href="">login</a></li> </ul> </nav> </header>filter。然后,您可以在结果数据帧上使用group_by

summarize

ggplot也可以在这里工作,只要“ TOT”始终在实际团队之前即可。

library(tidyverse)

read_table("Player                    Tm     MP     FGA
Ryan Anderson\anderry01   TOT    322    69
Ryan Anderson\anderry01   PHO    278    60
Ryan Anderson\anderry01   MIA    44     9
OG Anunoby\anunoog01      TOR    1352   404
Trevor Ariza\arizatr01    TOT    2349   736
Trevor Ariza\arizatr01    PHO    884    227
Trevor Ariza\arizatr01    WAS    1465   509") -> data

data %>%
filter(TM != "TOT") %>%
group_by(Player) %>%
summarize(MP = sum(MP), FGA = sum(FGA))

# A tibble: 3 x 3
  Player                       MP   FGA
  <chr>                     <dbl> <dbl>
1 "OG Anunoby\anunoog01"     1352   404
2 "Ryan Anderson\anderry01"   322    69
3 "Trevor Ariza\arizatr01"   2349   736

此外,如果您要处理篮球参考数据,请查看distinct程序包(https://cran.r-project.org/web/packages/ballr/index.html),该程序包提供了用于与Basketballreference.com进行交互的api。