同一部电影中演员的无向组合

时间:2017-11-07 02:05:17

标签: r dataframe tidyverse purrr tibble

我不确定如何描述我尝试做的操作。我有一个包含两列(电影和演员)的数据框。我想从这里创建一个基于他们在一起的电影的独特双人组合的列表。下面是代码,它创建了我拥有的数据框的示例,以及另一个数据框,它是我想要的结果。


start_data <- tibble::tribble(
  ~movie, ~actor,
  "titanic", "john",
  "star wars", "john",
  "baby driver", "john",
  "shawshank", "billy",
  "titanic", "billy",
  "star wars", "sarah",
  "titanic", "sarah"
)

end_data <- tibble::tribble(
  ~movie, ~actor1, ~actor2,
  "titanic", "john", "billy",
  "titanic", "john", "sarah",
  "titanic", "billy", "sarah",
  "star wars", "john", "sarah"
)

感谢任何帮助,谢谢!奖金积分如果是短的++

2 个答案:

答案 0 :(得分:3)

您可以使用combn(..., 2)查找两个actor组合,这两个组合可以转换为两列 tibble 并存储在带有summarize的列表列中;要获得平面数据框,请使用unnest

library(tidyverse)

start_data %>% 
    group_by(movie) %>% 
    summarise(acts = list(
        if(length(actor) > 1) set_names(as.tibble(t(combn(actor, 2))), c('actor1', 'actor2')) 
        else tibble()
    )) %>% 
    unnest()

# A tibble: 4 x 3
#      movie actor1 actor2
#      <chr>  <chr>  <chr>
#1 star wars   john  sarah
#2   titanic   john  billy
#3   titanic   john  sarah
#4   titanic  billy  sarah

答案 1 :(得分:2)

library(tidyverse)
library(stringr)

inner_join(start_data, start_data, by = "movie") %>% 
  filter(actor.x != actor.y) %>% 
  rowwise() %>% 
  mutate(combo = str_c(min(actor.x, actor.y), "_", max(actor.x, actor.y))) %>% 
  ungroup() %>%
  select(movie, combo) %>% 
  distinct %>% 
  separate(combo, c("actor1", "actor2"))