自定义函数可根据R中的条件使重复的值区分开

时间:2018-08-18 04:34:39

标签: r

可以从here下载数据集

library(dplyr)
NBA <- read.csv("NBA Season Dataset/Seasons_Stats.csv")
NBA$Player <- as.character(NBA$Player)
PlayerData <- read.csv("NBA Season Dataset/player_data.csv")
PlayerData$name <- as.character(PlayerData$name)

我想从PlayerData中获取他们的身高和体重,然后与主要数据NBA合并。问题在于该NBA球员数据集包含一些与其他球员共享相同名称的球员,因此在将两个数据框与merge合并为球员名称之前,我需要区分他们的名字。

PlayerData[duplicated(PlayerData$name), "name"]给我50个重复的名字。

因此,我创建了一个函数,该函数将根据活跃数据的年份在两个数据框中重命名播放器:

unduplicate <- function(name, year_start, year_end, new_name) { 
    PlayerData[PlayerData$name == name & PlayerData$year_start == year_start & PlayerData$year_end == year_end, 1] = new_name
    NBA[NBA$Player == name & NBA$Year <= year_end & NBA$Year >= year_start, "Player"] = new_name
}

例如,我以这两个名字相同的球员为例。 DeeBrown1

然后调用该函数:

unduplicate("Dee Brown", 1991, 2002, "Dee Brown 1")
unduplicate("Dee Brown", 2007, 2009, "Dee Brown 2")

什么都没有改变...

但是,如果我手动这样做:

PlayerData[PlayerData$name == "Dee Brown" & PlayerData$year_start == 1991 & PlayerData$year_end == 2002, 1] = "Dee Brown 1"
NBA[NBA$Player == "Dee Brown" & NBA$Year <= 2002 & NBA$Year >= 1991, "Player"] = "Dee Brown 1"

PlayerData[PlayerData$name == "Dee Brown" & PlayerData$year_start == 2007 & PlayerData$year_end == 2009, 1] = "Dee Brown 2"
NBA[NBA$Player == "Dee Brown" & NBA$Year <= 2009 & NBA$Year >= 2007, "Player"] = "Dee Brown 2"

然后达到所需的结果: DeeBrown2

所以我的问题是

1)函数有什么问题?我检查并尝试了许多变体,但没有用。

2)有什么更好的方法来解决这个问题?

我对此很陌生,所以如果这只是愚蠢的初学者的错误,请原谅我。

谢谢!

1 个答案:

答案 0 :(得分:1)

您可以使用与dplyr不同的方法来根据变量集选择唯一的玩家。 Sqldf库提供了根据条件与不等式合并表的可能性:

library(dplyr)                 
player_data <- read.csv("player_data.csv", stringsAsFactors = F)
Players <- read.csv("Players.csv", stringsAsFactors = F)
NBA1<-  read.csv("Seasons_Stats.csv", stringsAsFactors = F)

Dist_players <-player_data%>%
  distinct(name, year_start, year_end, height,  weight )

library(sqldf)
Final <- sqldf("SELECT * FROM NBA1 JOIN Dist_players ON NBA1.Player = Dist_players.name 
      WHERE NBA1.Year >= Dist_players.year_start AND NBA1.Year <= Dist_players.year_end")