我正在尝试创建一个新变量,该变量具有来自两个不同列的字符串值的唯一计数。所以我有这样的东西,例如:
# A tibble: 4 x 2
names partners
<fct> <fct>
1 John Mary, Ashley, John, Kate
2 Mary Charlie, John, Mary, John
3 Charlie Kate, Marcy
4 David Mary, Claire
structure(list(names = structure(c(3L, 4L, 1L, 2L), .Label = c("Charlie",
"David", "John", "Mary"), class = "factor"), partners = structure(c(3L,
1L, 2L, 4L), .Label = c("Charlie, John, Mary, John", "Kate, Marcy",
"Mary, Ashley, John, Kate", "Mary, Claire"), class = "factor")), row.names = c(NA,
4L), class = "data.frame")
我想得到这样的东西
# A tibble: 4 x 3
names partners uniquecounts
<fct> <fct> <dbl>
1 John Mary, Ashley, John, Kate 4
2 Mary Charlie, John, Mary, John 3
3 Charlie Kate, Marcy 3
4 David Mary, Claire 3
我尝试将两列合并为一个,然后计算其中的唯一值,但这没有用。
答案 0 :(得分:2)
使用tidyverse
,首先将因子列转换为字符,使用map2
并将partners
拆分为单个字符串向量,然后使用{{ 1}}。
names
在基数R中具有相同的逻辑
n_distinct
答案 1 :(得分:0)
toString
还有另一种方式。
dat$uniquecounts <- sapply(strsplit(apply(dat, 1, toString), ", "),
function(x) length(unique(x)))
dat
# names partners uniquecounts
# 1 John Mary, Ashley, John, Kate 4
# 2 Mary Charlie, John, Mary, John 3
# 3 Charlie Kate, Marcy 3
# 4 David Mary, Claire 3
答案 2 :(得分:0)
这是一种使用tidyverse
而不循环的方法
library(tidyverse)
df1 %>%
mutate(partners = str_c(names, partners, sep=", ")) %>%
separate_rows(partners) %>%
distinct %>%
count(names) %>%
right_join(df1)
# A tibble: 4 x 3
# names n partners
# <fct> <int> <fct>
#1 John 4 Mary, Ashley, John, Kate
#2 Mary 3 Charlie, John, Mary, John
#3 Charlie 3 Kate, Marcy
#4 David 3 Mary, Claire