对tally / count(dplyr)

时间:2016-03-06 18:35:53

标签: r dplyr

这应该很简单,但我找不到直接的方法来实现它。我的数据集如下所示:

                DisplayName Nationality Gender Startyear
1           Alfred H. Barr, Jr.    American   Male      1929
2               Paul C\216zanne      French   Male      1929
3                  Paul Gauguin      French   Male      1929
4              Vincent van Gogh       Dutch   Male      1929
5         Georges-Pierre Seurat      French   Male      1929
6            Charles Burchfield    American   Male      1929
7                Charles Demuth    American   Male      1929
8             Preston Dickinson    American   Male      1929
9              Lyonel Feininger    American   Male      1929
10 George Overbury ("Pop") Hart    American   Male      1929
...

我希望按DisplayName和Gender分组,并获取每个名称的计数(它们在列表中重复多次,具有不同的年份信息)。

以下两个命令给出了相同的输出,但它们没有按计数输出“n”排序。关于如何实现这一点的任何想法?

artists <- data %>%
  filter(!is.na(Gender) & Gender != "NULL") %>%
  group_by(DisplayName, Gender) %>%
  tally(sort = T) %>%
  arrange(desc(n))


artists <- data %>%
  filter(!is.na(Gender) & Gender != "NULL") %>%
  count(DisplayName, Gender, sort = T)


                 DisplayName Gender     n
                       (chr)  (chr) (int)
1              A. F. Sherman   Male     1
2             A. G. Fronzoni   Male     2
3         A. Lawrence Kocher   Male     3
4            A. M. Cassandre   Male    21
5             A. R. De Ycaza Female     1
6  A.R. Penck (Ralf Winkler)   Male    20
7              Aaron Siskind   Male    25
8         Abigail Perlmutter Female     1
9            Abraham Rattner   Male     5
10         Abraham Walkowitz   Male    17
..                       ...    ...   ...

1 个答案:

答案 0 :(得分:6)

您的数据按两个变量分组。因此,在tally之后,您的数据框仍按显示名称分组。所以arrange(desc(n))正在排序,但 Disply名称。如果要按列n对所有数据帧进行排序,只需在排序前取消组合。试试这个:

artists <- data %>%
  filter(!is.na(Gender) & Gender != "NULL") %>%
  group_by(DisplayName, Gender) %>%
  tally(sort = T) %>%
  ungroup() %>%
  arrange(desc(n))