R:带有dplyr的条件嵌套分组摘要?

时间:2016-10-08 12:24:57

标签: r conditional dplyr summarization group-summaries

感谢@Frank和我的previous post(更多详细信息),在那里我可以用来回答关于酒吧中人们饮酒模式的数据集的一些问题:

bar_name,person,drink_ordered,times_ordered,liked_it
Moe’s Tavern,Homer,Romulan ale,2,TRUE
Moe’s Tavern,Homer,Scotch whiskey,1,FALSE
Moe’s Tavern,Guinan,Romulan ale,1,TRUE
Moe’s Tavern,Guinan,Scotch whiskey,3,FALSE
Moe’s Tavern,Rebecca,Romulan ale,2,FALSE
Moe’s Tavern,Rebecca,Scotch whiskey,4,TRUE
Cheers,Rebecca,Budweiser,1,TRUE
Cheers,Rebecca,Black Hole,1,TRUE
Cheers,Bender,Budweiser,1,FALSE
Cheers,Bender,Black Hole,1,TRUE
Cheers,Krusty,Budweiser,1,TRUE
Cheers,Krusty,Black Hole,1,FALSE
The Hip Joint,Homer,Scotch whiskey,3,FALSE
The Hip Joint,Homer,Corona,1,TRUE
The Hip Joint,Homer,Budweiser,1,FALSE
The Hip Joint,Krusty,Romulan ale,3,TRUE
The Hip Joint,Krusty,Black Hole,4,FALSE
The Hip Joint,Krusty,Corona,1,TRUE
The Hip Joint,Rebecca,Corona,2,TRUE
The Hip Joint,Rebecca,Romulan ale,4,FALSE
The Hip Joint,Bender,Corona,1,TRUE
Ten Forward,Bender,Romulan ale,1,
Ten Forward,Bender,Black Hole,,FALSE
Ten Forward,Guinan,Romulan ale,2,TRUE
Ten Forward,Guinan,Budweiser,,FALSE
Ten Forward,Krusty,Budweiser,1,
Ten Forward,Krusty,Black Hole,1,FALSE
Mos Eisley,Krusty,Black Hole,1,TRUE
Mos Eisley,Krusty,Corona,2,FALSE
Mos Eisley,Krusty,Romulan ale,1,TRUE
Mos Eisley,Homer,Black Hole,1,TRUE
Mos Eisley,Homer,Corona,2,FALSE
Mos Eisley,Homer,Romulan ale,1,TRUE
Mos Eisley,Bender,Black Hole,1,TRUE
Mos Eisley,Bender,Corona,2,FALSE
Mos Eisley,Bender,Romulan ale,1,TRUE
Quark’s Bar,Bender,Black Hole,1,TRUE
Quark’s Bar,Bender,water,1,FALSE
Quark’s Bar,Bender,unspecified,1,TRUE
Quark’s Bar,Homer,Black Hole,2,FALSE
Quark’s Bar,Guinan,unspecified,2,TRUE
Quark’s Bar,Guinan,Black Hole,1,TRUE
Quark’s Bar,Krusty,Black Hole,1,FALSE
Quark’s Bar,Krusty,water,2,FALSE
Quark’s Bar,Rebecca,unspecified,1,FALSE
Maz’s Tavern,Krusty,water,1,TRUE
Maz’s Tavern,Rebecca,water,1,FALSE
Maz’s Tavern,Homer,water,1,TRUE
Maz’s Tavern,Bender,water,2,FALSE

具体来说,@ Franks建议使用以下代码:

DF %>%
  arrange(drink_ordered, times_ordered, liked_it) %>% group_by(bar_name, person) %>%
  summarise(
    Ld   = toString(drink_ordered),
    Ldt  = paste(Ld, toString(times_ordered), sep="_"),
    Ldtl = paste(Ldt, toString(liked_it), sep="_")
  ) %>% 
  group_by(bar_name) %>% 
  summarise_each(funs(n_distinct)) %>%
  mutate_each(funs(. == 1), -person, -bar_name)

这会产生分组摘要,说明顾客是否在每个酒吧订购相同的饮料,有多少人,以及他们是否喜欢这些饮料:

#        bar_name person    Ld   Ldt  Ldtl
#           (chr)  (int) (lgl) (lgl) (lgl)
# 1        Cheers      3  TRUE  TRUE FALSE
# 2  Moe’s Tavern      3  TRUE FALSE FALSE
# 3    Mos Eisley      3  TRUE  TRUE  TRUE
# 4   Ten Forward      3 FALSE FALSE FALSE
# 5 The Hip Joint      4 FALSE FALSE FALSE

但是,对于这篇文章,我还有一个额外的问题,即一些人的饮品订单是unspecifiedQuark's Bar},有些人订购{{1} }:

  1. 对于water,我希望它充当"泛型"因此不会被视为不同的饮料(如果在该酒吧订购其他饮料)。例如,在unspecified我希望结果是Quark's Bar,每个人都订购了相同的饮料。当然,如果在酒吧中,每个人都只订购TRUE,结果也会是unspecified

  2. 对于TRUE,我通常希望它被忽略(例如,因为它不是酒精饮料!),所以起初我以为我可以简单地使用dplyr' s { {1}}删除订单为water的数据行。复杂的是,当人们订购的事件为filter()时,我希望结果为water,例如TRUE。所以我不认为我可以简单地删除water行,我希望它们被考虑!换句话说,我不希望Maz's Tavern计算,除非它是water中唯一订购的唯一内容。

  3. 有条件的方式(是正确的术语吗?)处理"例外" waterbar_name等项目?我更喜欢基于dplyr(即基于Hadley-verse)的解决方案,它生成一个像@Frank那样的表,上面的代码将这两个项目考虑在内,尽管你能想到的任何东西都会受到赞赏。谢谢!

0 个答案:

没有答案