我有一个与R中的库dplyr
相关的简单问题。
我的实际数据框如下所示:
Players <- data.frame(Group = c("A", "A", "A", "A", "B", "B", "B", "C","C","C"), Players= c("Jhon", "Jhon", "Jhon", "Charles", "Mike", "Mike","Carl", "Max", "Max","Max"))
Group Players
A Jhon
A Jhon
A Jhon
A Charles
B Mike
B Mike
B Carl
C Max
C Max
C Max
我希望得到另一个数据框,让每个组的玩家重复多次,并列出他们多少次。所以我想得到这个数据框:
Group Players TimesListed
A Jhon 3
B Mike 2
B Max 3
我试过这个:
Station <- Players %>% group_by(Group,Players) %>%
summarise(TimesListed=length(Players)) %>%
summarise(TimesListed=max(TimesListed))
但是我得到的数据框没有像这样的玩家名字:
Group TimesListed
1 A 3
2 B 2
3 C 3
有什么想法吗?谢谢!
答案 0 :(得分:1)
这可以让你得到你想要的东西:
library(dplyr)
Players %>%
group_by(Group) %>%
count(Players) %>%
top_n(1, n)
# A tibble: 3 x 3
# Groups: Group [3]
Group Players n
<fctr> <fctr> <int>
1 A Jhon 3
2 B Mike 2
3 C Max 3
您可以执行以下操作将因子转换为字符:
Players[] <- lapply(Players, as.character)
如果您需要将变量n
更改为TimesListed
,请将以下内容添加到链的末尾:
rename(TimesListed = n)
答案 1 :(得分:1)
您可以在基础R中使用aggregate
功能:
aggregate(.~Group,dat,function(x)max(table(x)))
Group Players
1 A 3
2 B 2
3 C 3
答案 2 :(得分:0)
为了完整起见,这是使用data.table的解决方案。
library(data.table)
setDT(Players)
Players[, .(TimesListed = .N), by = .(Group, Players)][
, .SD[which.max(TimesListed)], by = Group]
# Group Players TimesListed
# 1: A Jhon 3
# 2: B Mike 2
# 3: C Max 3
上述解决方案将返回TimesListed
中最大值的第一行。如果我们想要返回所有等于最大值的行,我们可以执行以下操作。在这种情况下,这两种解决方案会产生相同的结果。
Players[, .(TimesListed = .N), by = .(Group, Players)][
, .SD[TimesListed == max(TimesListed)], by = Group]
# Group Players TimesListed
# 1: A Jhon 3
# 2: B Mike 2
# 3: C Max 3