用R中的字符串替换数据帧列中的整数,该数据帧列是整数向量列表(不仅仅是单个整数)

时间:2019-07-14 04:04:36

标签: r dplyr nested-lists

我有一个带有一列的数据框,该列实际上是整数向量的列表(不仅仅是单个整数)。

# make example dataframe
starting_dataframe <- 
  data.frame(first_names = c("Megan", 
                             "Abby", 
                             "Alyssa", 
                             "Alex", 
                             "Heather"))

starting_dataframe$player_indices <- 
  list(as.integer(1), 
       as.integer(c(2, 5)), 
       as.integer(3), 
       as.integer(4), 
       as.integer(c(6, 7)))

我想根据第二个一致性数据帧用字符串替换整数。

# make concordance dataframe
example_concord <- 
  data.frame(last_names = c("Rapinoe", 
                            "Wambach", 
                            "Naeher", 
                            "Morgan", 
                            "Dahlkemper", 
                            "Mitts", 
                            "O'Reilly"), 
              player_ids = as.integer(c(1,2,3,4,5,6,7)))

所需的结果如下:

# make dataframe of desired result
desired_result <- 
  data.frame(first_names = c("Megan", 
                             "Abby", 
                             "Alyssa", 
                             "Alex", 
                             "Heather"))

desired_result$player_indices <- 
  list(c("Rapinoe"), 
       c("Wambach", "Dahlkemper"), 
       c("Naeher"), 
       c("Morgan"), 
       c("Mitts", "O'Reilly"))

我一辈子都想不出办法,但在stackoverflow上找不到类似的情况。我该怎么做?我不介意特别针对dplyr的解决方案。

3 个答案:

答案 0 :(得分:2)

我建议创建各种“查找字典”,并在每个ID上使用lapply

example_concord_idx <- setNames(as.character(example_concord$last_names),
                                example_concord$player_ids)
example_concord_idx
#            1            2            3            4            5            6 
#    "Rapinoe"    "Wambach"     "Naeher"     "Morgan" "Dahlkemper"      "Mitts" 
#            7 
#   "O'Reilly" 

starting_dataframe$result <- 
  lapply(starting_dataframe$player_indices,
         function(a) example_concord_idx[a])
starting_dataframe
#   first_names player_indices              result
# 1       Megan              1             Rapinoe
# 2        Abby           2, 5 Wambach, Dahlkemper
# 3      Alyssa              3              Naeher
# 4        Alex              4              Morgan
# 5     Heather           6, 7     Mitts, O'Reilly

(打高尔夫吗?)

Map(`[`, list(example_concord_idx), starting_dataframe$player_indices)

答案 1 :(得分:1)

对于tidyverse爱好者,我将accepted answerr2evans的后半部分改成map()%>%

require(tidyverse)

starting_dataframe <- 
  starting_dataframe %>% 
  mutate(
    result = map(.x = player_indices, .f = function(a) example_concord_idx[a])
  )

不过,绝对不会赢得代码高尔夫!

答案 2 :(得分:1)

另一种方法是unlist列表列,并在修改其内容后relist

df1$player_indices <- relist(df2$last_names[unlist(df1$player_indices)], df1$player_indices)
df1
#>   first_names      player_indices
#> 1       Megan             Rapinoe
#> 2        Abby Wambach, Dahlkemper
#> 3      Alyssa              Naeher
#> 4        Alex              Morgan
#> 5     Heather     Mitts, O'Reilly

数据

## initial data.frame w/ list-column
df1 <- data.frame(first_names = c("Megan", "Abby", "Alyssa", "Alex", "Heather"), stringsAsFactors = FALSE)
df1$player_indices <- list(1, c(2,5), 3, 4, c(6,7))

## lookup data.frame
df2 <- data.frame(last_names = c("Rapinoe", "Wambach", "Naeher", "Morgan", "Dahlkemper", 
        "Mitts", "O'Reilly"), stringsAsFactors = FALSE)

注意:我设置stringsAsFactors = FALSE来在data.frames中创建字符列,但它与因子列同样有效。