Question

我有一个带有一列的数据框，该列实际上是整数向量的列表（不仅仅是单个整数）。

# make example dataframe
starting_dataframe <- 
  data.frame(first_names = c("Megan", 
                             "Abby", 
                             "Alyssa", 
                             "Alex", 
                             "Heather"))

starting_dataframe$player_indices <- 
  list(as.integer(1), 
       as.integer(c(2, 5)), 
       as.integer(3), 
       as.integer(4), 
       as.integer(c(6, 7)))

我想根据第二个一致性数据帧用字符串替换整数。

# make concordance dataframe
example_concord <- 
  data.frame(last_names = c("Rapinoe", 
                            "Wambach", 
                            "Naeher", 
                            "Morgan", 
                            "Dahlkemper", 
                            "Mitts", 
                            "O'Reilly"), 
              player_ids = as.integer(c(1,2,3,4,5,6,7)))

所需的结果如下：

# make dataframe of desired result
desired_result <- 
  data.frame(first_names = c("Megan", 
                             "Abby", 
                             "Alyssa", 
                             "Alex", 
                             "Heather"))

desired_result$player_indices <- 
  list(c("Rapinoe"), 
       c("Wambach", "Dahlkemper"), 
       c("Naeher"), 
       c("Morgan"), 
       c("Mitts", "O'Reilly"))

我一辈子都想不出办法，但在stackoverflow上找不到类似的情况。我该怎么做？我不介意特别针对dplyr的解决方案。

Answer 1

我建议创建各种“查找字典”，并在每个ID上使用lapply：

example_concord_idx <- setNames(as.character(example_concord$last_names),
                                example_concord$player_ids)
example_concord_idx
#            1            2            3            4            5            6 
#    "Rapinoe"    "Wambach"     "Naeher"     "Morgan" "Dahlkemper"      "Mitts" 
#            7 
#   "O'Reilly" 

starting_dataframe$result <- 
  lapply(starting_dataframe$player_indices,
         function(a) example_concord_idx[a])
starting_dataframe
#   first_names player_indices              result
# 1       Megan              1             Rapinoe
# 2        Abby           2, 5 Wambach, Dahlkemper
# 3      Alyssa              3              Naeher
# 4        Alex              4              Morgan
# 5     Heather           6, 7     Mitts, O'Reilly

（打高尔夫吗？）

Map(`[`, list(example_concord_idx), starting_dataframe$player_indices)

Answer 2

对于tidyverse爱好者，我将accepted answer的r2evans的后半部分改成map()和%>%：

require(tidyverse)

starting_dataframe <- 
  starting_dataframe %>% 
  mutate(
    result = map(.x = player_indices, .f = function(a) example_concord_idx[a])
  )

不过，绝对不会赢得代码高尔夫！

Answer 3

另一种方法是unlist列表列，并在修改其内容后relist：

df1$player_indices <- relist(df2$last_names[unlist(df1$player_indices)], df1$player_indices)
df1
#>   first_names      player_indices
#> 1       Megan             Rapinoe
#> 2        Abby Wambach, Dahlkemper
#> 3      Alyssa              Naeher
#> 4        Alex              Morgan
#> 5     Heather     Mitts, O'Reilly

数据

## initial data.frame w/ list-column
df1 <- data.frame(first_names = c("Megan", "Abby", "Alyssa", "Alex", "Heather"), stringsAsFactors = FALSE)
df1$player_indices <- list(1, c(2,5), 3, 4, c(6,7))

## lookup data.frame
df2 <- data.frame(last_names = c("Rapinoe", "Wambach", "Naeher", "Morgan", "Dahlkemper", 
        "Mitts", "O'Reilly"), stringsAsFactors = FALSE)

注意：我设置stringsAsFactors = FALSE来在data.frames中创建字符列，但它与因子列同样有效。

用R中的字符串替换数据帧列中的整数，该数据帧列是整数向量列表（不仅仅是单个整数）

3 个答案: