我有一个足球队数据集,如下所示:
Home_team Away_team Home_score Away_score
Arsenal Chelsea 1 3
Manchester U Blackburn 2 9
Liverpool Leeds 0 8
Chelsea Arsenal 4 1
我想将参与其中的球队归为一组,而不管哪个球队在家中和外地比赛。例如,如果切尔西出战阿森纳,无论比赛是在切尔西还是在阿森纳,我都希望新列“ teams_involved”为阿森纳-切尔西。我的猜测是这样做的方法是将这些团队按字母顺序添加到新列中,但是我不确定如何做到这一点。
所需的输出:
Home_team Away_team Home_score Away_score teams_involved
Arsenal Chelsea 1 3 Arsenal - Chelsea
Manchester U Blackburn 2 9 Blackburn - Manchester U
Liverpool Leeds 0 8 Leeds - Liverpool
Chelsea Arsenal 4 1 Arsenal - Chelsea
我寻求这个目的的原因是,无论比赛的地点在哪里,我都可以看到每支球队对特定球队的获胜次数。
答案 0 :(得分:2)
df = read.table(text = "
Home_team Away_team Home_score Away_score
Arsenal Chelsea 1 3
ManchesterU Blackburn 2 9
Liverpool Leeds 0 8
Chelsea Arsenal 4 1
", header=T, stringsAsFactors=F)
library(dplyr)
df %>%
rowwise() %>% # for each row
mutate(Teams = paste(sort(c(Home_team, Away_team)), collapse = " - ")) %>% # sort the teams alphabetically and then combine them separating with -
ungroup() # forget the row grouping
# # A tibble: 4 x 5
# Home_team Away_team Home_score Away_score Teams
# <chr> <chr> <int> <int> <chr>
# 1 Arsenal Chelsea 1 3 Arsenal - Chelsea
# 2 ManchesterU Blackburn 2 9 Blackburn - ManchesterU
# 3 Liverpool Leeds 0 8 Leeds - Liverpool
# 4 Chelsea Arsenal 4 1 Arsenal - Chelsea
没有rowwise
的替代解决方案:
# create function and vectorize it
f = function(x,y) {paste(sort(c(x, y)), collapse = " - ")}
f = Vectorize(f)
# apply function to your dataset
df %>% mutate(Teams = f(Home_team, Away_team))
答案 1 :(得分:1)
我们可以使用map2
按字母顺序遍历行和sort
的“ Home_team”,“ Away_team”列的元素
library(tidyverse)
df %>%
mutate(Teams = map2(Home_team, Away_team, ~
paste(sort(c(.x, .y)), collapse= ' - ')))
# Home_team Away_team Home_score Away_score Teams
#1 Arsenal Chelsea 1 3 Arsenal - Chelsea
#2 ManchesterU Blackburn 2 9 Blackburn - ManchesterU
#3 Liverpool Leeds 0 8 Leeds - Liverpool
#4 Chelsea Arsenal 4 1 Arsenal - Chelsea
或者另一个选择是pmin/pmax
df %>%
mutate(Teams = paste(pmin(Home_team, Away_team),
pmax(Home_team, Away_team), sep= " - "))
或使用base R
df$Teams <- paste(do.call(pmin, df[1:2]), do.call(pmax, df[1:2]), sep= ' - ')
df <- structure(list(Home_team = c("Arsenal", "ManchesterU", "Liverpool",
"Chelsea"), Away_team = c("Chelsea", "Blackburn", "Leeds", "Arsenal"
), Home_score = c(1L, 2L, 0L, 4L), Away_score = c(3L, 9L, 8L,
1L)), .Names = c("Home_team", "Away_team", "Home_score", "Away_score"
), class = "data.frame", row.names = c(NA, -4L))
答案 2 :(得分:0)
一个简单的ifelse语句也可以工作:
df$teams_involved <- ifelse(df$Home_team > df$Away_team,
paste(df$Away_team, df$Home_team, sep = " - "),
paste(df$Home_team, df$Away_team, sep = " - "))