这应该很简单,但我找不到直接的方法来实现它。我的数据集如下所示:
DisplayName Nationality Gender Startyear
1 Alfred H. Barr, Jr. American Male 1929
2 Paul C\216zanne French Male 1929
3 Paul Gauguin French Male 1929
4 Vincent van Gogh Dutch Male 1929
5 Georges-Pierre Seurat French Male 1929
6 Charles Burchfield American Male 1929
7 Charles Demuth American Male 1929
8 Preston Dickinson American Male 1929
9 Lyonel Feininger American Male 1929
10 George Overbury ("Pop") Hart American Male 1929
...
我希望按DisplayName和Gender分组,并获取每个名称的计数(它们在列表中重复多次,具有不同的年份信息)。
以下两个命令给出了相同的输出,但它们没有按计数输出“n”排序。关于如何实现这一点的任何想法?
artists <- data %>%
filter(!is.na(Gender) & Gender != "NULL") %>%
group_by(DisplayName, Gender) %>%
tally(sort = T) %>%
arrange(desc(n))
artists <- data %>%
filter(!is.na(Gender) & Gender != "NULL") %>%
count(DisplayName, Gender, sort = T)
DisplayName Gender n
(chr) (chr) (int)
1 A. F. Sherman Male 1
2 A. G. Fronzoni Male 2
3 A. Lawrence Kocher Male 3
4 A. M. Cassandre Male 21
5 A. R. De Ycaza Female 1
6 A.R. Penck (Ralf Winkler) Male 20
7 Aaron Siskind Male 25
8 Abigail Perlmutter Female 1
9 Abraham Rattner Male 5
10 Abraham Walkowitz Male 17
.. ... ... ...
答案 0 :(得分:6)
您的数据按两个变量分组。因此,在tally
之后,您的数据框仍按显示名称分组。所以arrange(desc(n))
正在排序,但按 Disply名称。如果要按列n对所有数据帧进行排序,只需在排序前取消组合。试试这个:
artists <- data %>%
filter(!is.na(Gender) & Gender != "NULL") %>%
group_by(DisplayName, Gender) %>%
tally(sort = T) %>%
ungroup() %>%
arrange(desc(n))