Question

我有一个带有两个变量的数据框（df2），Mood和PartOfTown，其中Mood是一个多选（即允许选项的任意组合）问题评价一个人的幸福，以及PartOfTown描述了地理位置。

问题在于，中心编码情绪不同，城镇北部的中心使用NorthCode，南部的中心使用SouthCode（df1）。

我希望将数据集（df2）中的所有条目重新编码为SouthCode，以便最终得到像df3这样的数据集。我想要一个通用的解决方案，因为可能会有新的条目，其中新的组合目前不在数据集中。任何有关它的想法将非常感激。

心情的中心代码和定义：

df1 <- data.frame(NorthCode=c(4,5,6,7,99),NorthDef=c("happy","sad","tired","energetic","other"),SouthCode=c(7,8,9,5,99),SouthDef=c("happy","sad","tired","energetic","other"))

起点：

df2 <- data.frame(Mood=c("4","5","6","7","4,5","5,6,99","99","7","8","9","5","7,8","8,5,99","99"),Region=c("north","north","north","north","north","north","north","south","south","south","south","south","south","south"))

期望的结果：

df3 <- data.frame(Mood=c("7","8","9","5","7,8","8,9,99","99","7","8","9","5","7,8","8,5,99","99"),PartofTown=c("north","north","north","north","north","north","north","south","south","south","south","south","south","south"))

当前尝试：尝试通过拆分条目开始但无法使其起作用。

unlist(strsplit(df2$Mood, ","))

Answer 1

你在使用strsplit的正确路径上，但是你需要将stringsAsFactors = F添加到as.data.frame（）以确保Mood是一个字符向量，而不是一个因素。之后，您可以将分隔的元素保留为列表，并使用lapply（）将旧代码与新代码进行匹配。

df1 <- 
  data.frame(NorthCode=c(4,5,6,7,99),
             NorthDef=c("happy","sad","tired","energetic","other"),
             SouthCode=c(7,8,9,5,99),
             SouthDef=c("happy","sad","tired","energetic","other"), 
             stringsAsFactors = F)

df2 <- 
  data.frame(Mood=c("4","5","6","7","4,5","5,6,99","99","7","8","9","5","7,8","8,5,99","99"),
             Region=c("north","north","north","north","north","north","north","south","south","south","south"    ,"south","south","south"), 
             stringsAsFactors = F)

df3 <- 
  data.frame(Mood=c("7","8","9","5","7,8","8,9,99","99","7","8","9","5","7,8","8,5,99","99"),
             PartofTown=c("north","north","north","north","north","north","north","south","south","south","south"   ,"south","south","south"),
             stringsAsFactors = F)

# Split the Moods into separate values
splitCodes <- strsplit(df2$Mood, ",")
# Add the Region as the name of each element in the new list
names(splitCodes) <- df2$Region

# Recode the values by matching the north values to the south values
recoded <- 
  lapply(
    seq_along(splitCodes),
    function(x){
      ifelse(rep(names(splitCodes[x]) == "north", length(splitCodes[[x]])),
             df1$SouthCode[match(splitCodes[[x]], df1$NorthCode)],
             splitCodes[[x]])
    }
  )

# Add the recoded values back to df2
df2$recoded <- 
  sapply(recoded,
         paste,
         collapse = ",")

# Check if the recoded values match your desired values    
identical(df2$recoded, df3$Mood)

在R中记录逗号分隔的条目

1 个答案: