基本上,我有一个用逗号分隔的字符串向量。我正在寻找使用字符串的唯一值进行一次编码的方法。我相信我必须先找到唯一值(用逗号分隔),然后才能用作一键编码之前的列,但是我不确定。例如,假设我有以下字符向量:
people_names
Bob,Megan,Mike,Sarah
Mike,Sarah
Megan,Sarah
Bob
我希望创建一个与该向量相对应的结果单编码数据帧,如下所示:
Bob Megan Mike Sarah
1 1 1 1
0 0 1 1
0 1 0 1
1 0 0 0
感谢您的帮助。我真的很感激。
答案 0 :(得分:2)
people_names = c("Bob,Megan,Mike,Sarah",
"Mike,Sarah",
"Megan,Sarah",
"Bob")
library(tidyverse)
data.frame(people_names) %>% # create a dataframe
mutate(id = row_number(), # add row id (useful for reshaping)
value = 1) %>% # add a column of 1s to denote existence
separate_rows(people_names) %>% # create one row per name keeping relevant info
spread(people_names, value, fill = 0) %>% # reshape
select(-id) # remove row id
# Bob Megan Mike Sarah
# 1 1 1 1 1
# 2 0 0 1 1
# 3 0 1 0 1
# 4 1 0 0 0
答案 1 :(得分:0)
或者,splitstackshape
包中有一个帮助程序功能,您可能会觉得有用。输出是一个矩阵
splitstackshape:::charMat(strsplit(people_names, ","), fill = 0L)
# Bob Megan Mike Sarah
#[1,] 1 1 1 1
#[2,] 0 0 1 1
#[3,] 0 1 0 1
#[4,] 1 0 0 0
在同一软件包中,您也可以尝试cSplit_e
library(splitstackshape)
out <- cSplit_e(
data.frame(people_names),
split.col = "people_names",
sep = ",",
mode = "binary",
type = "character",
fill = 0L,
drop = TRUE
)
# remove prefix of column names
(out <- setNames(out, sub("people_names_", "", names(out), fixed = TRUE)))
数据
people_names = c("Bob,Megan,Mike,Sarah",
"Mike,Sarah",
"Megan,Sarah",
"Bob")