数据框中按组分组的唯一列表元素的长度

时间:2017-02-10 20:04:29

标签: r

我希望在list中的列名称中按组计算唯一data frame元素的长度。我的输入data frame

NameList = list(c("Sam", "Gemma", "Alison", "Tom"),c("Oliver", "Alison"),c("Tom", "Alison", "Harry"),c("Vin", "Harry"), c("Jason", "Sam", "Harry"),c("Anton", "Harry"),c("Harry"),c("Vin", "Jack"))

df <- data.frame(Name = c('Alison','Alison','Alison','Harry','Harry','Harry','Harry','Jack'), NameList = sapply(NameList, paste0, collapse = ','))

我想按小组df$name计算列表元素的唯一长度,如下所示:

Name    unique_Num_Name
Alison   6
Harry    5
Jack     2

我知道如何获得一长串独特的元素列表length(unique(unlist(df$NameList)))。但是,对于我的数据框架,我没有成功获得一个小组。所以,我将不胜感激任何指导或帮助。

2 个答案:

答案 0 :(得分:1)

拆分为name定义的群组,并为每个群组使用length-unique-unlist组合:

lapply(split(dat, dat$Name), function(x) {
  length(unique(unlist(x$NameList)))
})

<强>更新 正如Rich Scriven在评论中所说,tapply是更好的选择:

with(dat, 
  tapply(NameList, Name, FUN=function(x) 
    length(unique(unlist(x)))
  )
)

示例数据:

structure(
  list(
    Name = structure(
      c(1L, 1L, 1L, 2L, 2L, 2L, 2L,
        3L),
      .Label = c("Alison", "Harry", "Jack"),
      class = "factor"
    ),
    NameList = structure(list(
      c("Sam", "Gemma", "Alison", "Tom"),
      c("Oliver", "Alison"),
      c("Tom", "Alison", "Harry"),
      c("Vin",
        "Harry"),
      c("Jason", "Sam", "Harry"),
      c("Anton", "Harry"),
      "Harry",
      c("Vin", "Jack")
    ), class = "AsIs")
  ),
  .Names = c("Name",
             "NameList"),
  row.names = c(NA,-8L),
  class = "data.frame"
)

答案 1 :(得分:1)

您可以使用dplyr中的tidyrtidyverse个包:

library(tidyverse)
separate_rows(df, NameList, sep = ",") %>% 
  group_by(Name) %>% 
  summarise(uniq_names = n_distinct(NameList))

结果是:

# A tibble: 3 × 2
    Name uniq_names
  <fctr>      <int>
1 Alison          6
2  Harry          5
3   Jack          2

输入数据:

NameList = list(c("Sam", "Gemma", "Alison", "Tom"),c("Oliver", "Alison"),c("Tom", "Alison", "Harry"),c("Vin", "Harry"),
                c("Jason", "Sam", "Harry"),c("Anton", "Harry"),c("Harry"),c("Vin", "Jack"))

df <- data.frame(Name = c('Alison','Alison','Alison','Harry','Harry','Harry','Harry','Jack'),
                 NameList = sapply(NameList, paste0, collapse = ','))