Question

我有以下数据，我想知道从两个以上品牌购买的人的百分比：

     hh_code    brand
     3032145       536
     3032145       53
     3032145       534
     324063        536
     204128        53
     84787         536

我想了解每个家庭购买的品牌数量 - 如下所示：

   hh_code    unique_ brand
   3032145    3
   847827     1
   204128     1
    84787     1

我尝试过使用表，但它只是给我频率。非常感谢任何见解！

Answer 1

我们可以使用data.table

library(data.table)
setDT(df1)[, .(unique_brand = uniqueN(brand)), by = hh_code]
#   hh_code unique_brand
#1: 3032145            3
#2:  324063            1
#3:  204128            1
#4:   84787            1

Answer 2

使用tapply的简单基础R解决方案：

num_brands <- tapply(df$brand, df$hh_code, length)
ge2_brands <- num_brands > 2

R：获取按另一个分类变量排序的分类数据的唯一值

2 个答案: