现在,我有一个主要功能(让他们称之为performance()),其参数为player1,player2和team_of_interest。
我有一个如下所示的数据集:
> head(roster_van, 3)
team_name team venue num_first_last
1 VANCOUVER CANUCKS VAN Home 5 SBISA, LUCA
2 VANCOUVER CANUCKS VAN Home 8 TANEV, CHRISTOPHER
3 VANCOUVER CANUCKS VAN Home 14 BURROWS, ALEXANDRE
game_date game_id season session player_number
1 2016-10-15 2016020029 20162017 R 5
2 2016-10-15 2016020029 20162017 R 8
3 2016-10-15 2016020029 20162017 R 14
team_num first_name last_name player_name
1 VAN5 LUCA SBISA LUCA.SBISA
2 VAN8 CHRISTOPHER TANEV CHRIS.TANEV
3 VAN14 ALEXANDRE BURROWS ALEX.BURROWS
name_match player_position
1 LUCASBISA D
2 CHRISTOPHERTANEV D
3 ALEXANDREBURROWS L
这是一个赛季中曲棍球比赛的名单数据。
我想创建另一个功能(让他们称之为球员()),循环遍布曲棍球队中每对独特的球员,并将他们的名字和球队提供给球员1,球员2和球队_of_interest performance()函数内的参数。
我已经开始了,但不知道接下来要做什么:
name_pairs <- function(x,y) {
x <- seq(1,19, by = 2)
y <- x+1
}
答案 0 :(得分:1)
merge
可以快速完成从数据框中生成笛卡尔联接。
使用缩短版的样本数据框并猜测team_of_interest列。
library(tidyverse)
roster_van <- tibble(team = "VAN",
team_num = c(5, 8, 14),
player_name = c("LUCA.SBISA", "CHRIS.TANEV", "ALEX.BURROWS"),
player_position = c("D", "D", "L"),
team_of_interest = c("SL BLUES", "BOS BRUINS", "CGY FLAMES")
)
roster_van
> roster_van # A tibble: 3 x 5 team team_num player_name player_position team_of_interest <chr> <dbl> <chr> <chr> <chr> 1 VAN 5 LUCA.SBISA D SL BLUES 2 VAN 8 CHRIS.TANEV D BOS BRUINS 3 VAN 14 ALEX.BURROWS L CGY FLAMES
如果您只想重复几个列,那么在过滤掉相同的自连接之前,只需要将您希望再次看到的列名重命名为原始数据帧。
roster_van_pairs <-
roster_van %>%
merge(roster_van %>%
select(team,
team_num_paired = team_num,
player_name_paired = player_name
)
) %>%
filter(player_name != player_name_paired)
roster_van_pairs
> roster_van_pairs team team_num player_name player_position team_of_interest team_num_paired player_name_paired 1 VAN 5 LUCA.SBISA D SL BLUES 8 CHRIS.TANEV 2 VAN 5 LUCA.SBISA D SL BLUES 14 ALEX.BURROWS 3 VAN 8 CHRIS.TANEV D BOS BRUINS 5 LUCA.SBISA 4 VAN 8 CHRIS.TANEV D BOS BRUINS 14 ALEX.BURROWS 5 VAN 14 ALEX.BURROWS L CGY FLAMES 5 LUCA.SBISA 6 VAN 14 ALEX.BURROWS L CGY FLAMES 8 CHRIS.TANEV
如果您想使用批量方法再次加入所有列,您可以使用以下代码执行所有列的完全重命名:
roster_van_copy <- roster_van
# provenience the data quickly
colnames(roster_van_copy) <- colnames(roster_van_copy) %>% paste0(., "_paired")
这使得交叉连接代码也更加简洁:
roster_van_all_columns_paired <-
roster_van %>%
merge(roster_van_copy) %>%
filter(player_name != player_name_paired)
我想这会给你留下比必要更多的列,但毕竟用select(-c(<col_x:col_y))
很容易删除它们。
roster_van_all_columns_paired
> roster_van_all_columns_paired team team_num player_name player_position team_of_interest team_paired team_num_paired player_name_paired 1 VAN 8 CHRIS.TANEV D BOS BRUINS VAN 5 LUCA.SBISA 2 VAN 14 ALEX.BURROWS L CGY FLAMES VAN 5 LUCA.SBISA 3 VAN 5 LUCA.SBISA D SL BLUES VAN 8 CHRIS.TANEV 4 VAN 14 ALEX.BURROWS L CGY FLAMES VAN 8 CHRIS.TANEV 5 VAN 5 LUCA.SBISA D SL BLUES VAN 14 ALEX.BURROWS 6 VAN 8 CHRIS.TANEV D BOS BRUINS VAN 14 ALEX.BURROWS player_position_paired team_of_interest_paired 1 D SL BLUES 2 D SL BLUES 3 D BOS BRUINS 4 D BOS BRUINS 5 L CGY FLAMES 6 L CGY FLAMES
Base R方法可能如下所示:
roster.van.all.copy.baseR <- merge(roster_van, roster_van_copy)
roster.van.all.baseR <- roster.van.all.copy.baseR[ which(roster.van.all.copy.baseR$player_name != roster.van.all.copy.baseR$player_name_paired), ]