尽管有秩序,但仍然是独特组合

时间:2016-08-15 15:31:09

标签: r dplyr

提前致谢。我有一个家庭成员的数据框架以及他们与家庭户主的关系,我想计算家庭结构的独特组合的数量。

我可以通过将数据转换为宽格式并使用ddply计数来实现这一点(可能以迂回的方式),但这并不能解释具有不同顺序的相同族结构。像这样:

familyMember <- c("son","son","Head of household","daughter","grandmother","Head of household","son",
              "Head of household","son","son","daughter","grandmother","Head of household","son")
familyGroup <- c(1,1,1,2,2,2,2,3,3,3,4,4,4,4)
families <- data.frame(familyMember,familyGroup)

请注意,familyGroups&#39; 2&#39;和&#39; 4&#39;是完全相同的家庭结构在同一个顺序。请注意,familyGroups&#39; 1&#39;和&#39; 3&#39;是相同的家庭结构,但顺序不同。然后我使用dplyr创建一个索引,该索引是&#39;家庭成员的数量&#39;对于每个家庭组&#39;

familiesIndex <- ddply(families, .(familyGroup), mutate, 
          index = paste0('family', 1:length(familyGroup)))     

接下来我重塑为宽格式:

familiesIndex_reshape <- reshape(familiesIndex, idvar = "familyGroup", timevar="index", direction = "wide")

最后,我使用count来获取唯一组合的数量:

familiesIndex_reshape_Unique <- count(familiesIndex_reshape, 
                                 familyMember.family1,
                                 familyMember.family2,
                                 familyMember.family3,
                                 familyMember.family4) %>% ungroup()

这导致家庭组1和3的单独组。我希望尽管他们的顺序,这两个组被计为相同。非常感谢,再次。

0 个答案:

没有答案