Fifa2 dataset首先,我不是开发人员,对R的使用经验很少,所以请原谅我。我尝试自己完成此操作,但是用完了使用'filter'命令过滤数据框的想法。
数据框大约有十二列,其中一列是Grp(表示组)。这是FIFA足球数据集,因此在这种情况下,组表示球员所处的常规位置(防守,中场,守门员,前锋)。
我需要过滤此数据帧以向我提供此确切信息: 前四名防守球员 前四名中场球员 前2名前锋 前1名守门员
“顶部”是什么意思?它由Grp列安排,这只是一个数字。因此,前4名就像22、21、21、20(或类似的数字,因为实际上可以为不同的玩家重复该数字)。 “增长”列是“潜在值”列和“总体”列之间的差异,因此再次简单地减去就可以找到它们之间的差异。
#Create a subset of the data frame
library(dplyr)
fifa2 <- fifa %>% select(Club,Name,Position,Overall,Potential,Contract.Valid.Until2,Wage2,Value2,Release.Clause2,Grp) %>% arrange(Club)
#Add columns for determining potential
fifa2$Growth <- fifa2$Potential - fifa2$Overall
head(fifa2)
#Find Southampton Players
ClubName <- filter(fifa2, Club == "Southampton") %>%
group_by(Grp) %>% arrange(desc(Growth), .by_group=TRUE) %>%
top_n(4)
ClubName
ClubName2 <- ggplot(ClubName, aes(x=forcats::fct_reorder(Name, Grp),
y=Growth, fill = Grp)) +
geom_bar(stat = "identity", colour = "black") +
coord_flip() + xlab("Player Names") + ylab("Unfilled Growth Potential") +
ggtitle("Southampton Players, Grouped by Position")
ClubName2
该图产生了一个球员列表,最终每个位置上都有前4名球员(top_n(4)),但是我需要根据上述逻辑对其进行进一步过滤。我该如何实现?我试着用dplyr摆弄,这很容易按Grp名称获取行,但是看不到如何将其过滤到我需要的4-4-2-1。任何帮助表示赞赏。
fifa2和ClubName的样本输出(显示按top_n(4)排序的数据:
答案 0 :(得分:0)
这可能不是最优雅的解决方案,但希望它能起作用:)
# create dummy data
data_test = data.frame(grp = sample(c("def", "mid", "goal", "front"), 30, replace = T), growth = rnorm(30, 100,10), stringsAsFactors = F)
# create referencetable to give the number of players needed per grp
desired_n = data.frame(grp = c("def", "mid", "goal", "front"), top_n_desired = c(4,4,1,2), stringsAsFactors = F)
# > desired_n
# grp top_n_desired
# 1 def 4
# 2 mid 4
# 3 goal 1
# 4 front 2
# group and arrange, than look up the desired amount of players in the referencetable and select them.
data_test %>% group_by(grp) %>% arrange(desc(growth)) %>%
slice(1:desired_n$top_n_desired[which(first(grp) == desired_n$grp)]) %>%
arrange(grp)
# A bit more readable, but you have to create an additional column in your dataframe
# create additional column with desired amount for the position written in grp of each player
data_test = merge(data_test, desired_n, by = "grp", all.x = T
)
data_test %>% group_by(grp) %>% arrange(desc(growth)) %>%
slice(1:first(top_n_desired)) %>%
arrange(grp)