我希望在list
中的列名称中按组计算唯一data frame
元素的长度。我的输入data frame
:
NameList = list(c("Sam", "Gemma", "Alison", "Tom"),c("Oliver", "Alison"),c("Tom", "Alison", "Harry"),c("Vin", "Harry"), c("Jason", "Sam", "Harry"),c("Anton", "Harry"),c("Harry"),c("Vin", "Jack"))
df <- data.frame(Name = c('Alison','Alison','Alison','Harry','Harry','Harry','Harry','Jack'), NameList = sapply(NameList, paste0, collapse = ','))
我想按小组df$name
计算列表元素的唯一长度,如下所示:
Name unique_Num_Name
Alison 6
Harry 5
Jack 2
我知道如何获得一长串独特的元素列表length(unique(unlist(df$NameList)))
。但是,对于我的数据框架,我没有成功获得一个小组。所以,我将不胜感激任何指导或帮助。
答案 0 :(得分:1)
拆分为name
定义的群组,并为每个群组使用length-unique-unlist
组合:
lapply(split(dat, dat$Name), function(x) {
length(unique(unlist(x$NameList)))
})
<强>更新强>
正如Rich Scriven在评论中所说,tapply
是更好的选择:
with(dat,
tapply(NameList, Name, FUN=function(x)
length(unique(unlist(x)))
)
)
示例数据:
structure(
list(
Name = structure(
c(1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L),
.Label = c("Alison", "Harry", "Jack"),
class = "factor"
),
NameList = structure(list(
c("Sam", "Gemma", "Alison", "Tom"),
c("Oliver", "Alison"),
c("Tom", "Alison", "Harry"),
c("Vin",
"Harry"),
c("Jason", "Sam", "Harry"),
c("Anton", "Harry"),
"Harry",
c("Vin", "Jack")
), class = "AsIs")
),
.Names = c("Name",
"NameList"),
row.names = c(NA,-8L),
class = "data.frame"
)
答案 1 :(得分:1)
您可以使用dplyr
中的tidyr
和tidyverse
个包:
library(tidyverse)
separate_rows(df, NameList, sep = ",") %>%
group_by(Name) %>%
summarise(uniq_names = n_distinct(NameList))
结果是:
# A tibble: 3 × 2
Name uniq_names
<fctr> <int>
1 Alison 6
2 Harry 5
3 Jack 2
输入数据:
NameList = list(c("Sam", "Gemma", "Alison", "Tom"),c("Oliver", "Alison"),c("Tom", "Alison", "Harry"),c("Vin", "Harry"),
c("Jason", "Sam", "Harry"),c("Anton", "Harry"),c("Harry"),c("Vin", "Jack"))
df <- data.frame(Name = c('Alison','Alison','Alison','Harry','Harry','Harry','Harry','Jack'),
NameList = sapply(NameList, paste0, collapse = ','))