筛选具有特定要求的数据框

时间:2019-03-19 13:40:01

标签: r

Fifa2 dataset首先,我不是开发人员,对R的使用经验很少,所以请原谅我。我尝试自己完成此操作,但是用完了使用'filter'命令过滤数据框的想法。

数据框大约有十二列,其中一列是Grp(表示组)。这是FIFA足球数据集,因此在这种情况下,组表示球员所处的常规位置(防守,中场,守门员,前锋)。

我需要过滤此数据帧以向我提供此确切信息: 前四名防守球员 前四名中场球员 前2名前锋 前1名守门员

“顶部”是什么意思?它由Grp列安排,这只是一个数字。因此,前4名就像22、21、21、20(或类似的数字,因为实际上可以为不同的玩家重复该数字)。 “增长”列是“潜在值”列和“总体”列之间的差异,因此再次简单地减去就可以找到它们之间的差异。

#Create a subset of the data frame
library(dplyr)
fifa2 <- fifa %>%   select(Club,Name,Position,Overall,Potential,Contract.Valid.Until2,Wage2,Value2,Release.Clause2,Grp) %>% arrange(Club)
#Add columns for determining potential 
fifa2$Growth <- fifa2$Potential - fifa2$Overall
head(fifa2)

#Find Southampton Players
ClubName <- filter(fifa2, Club == "Southampton") %>% 
  group_by(Grp) %>% arrange(desc(Growth), .by_group=TRUE) %>% 
  top_n(4)
ClubName

ClubName2 <- ggplot(ClubName, aes(x=forcats::fct_reorder(Name, Grp),
                                  y=Growth, fill = Grp)) +
  geom_bar(stat = "identity", colour = "black") +
  coord_flip() + xlab("Player Names") + ylab("Unfilled Growth Potential") +
  ggtitle("Southampton Players, Grouped by Position")
ClubName2

该图产生了一个球员列表,最终每个位置上都有前4名球员(top_n(4)),但是我需要根据上述逻辑对其进行进一步过滤。我该如何实现?我试着用dplyr摆弄,这很容易按Grp名称获取行,但是看不到如何将其过滤到我需要的4-4-2-1。任何帮助表示赞赏。

fifa2和ClubName的样本输出(显示按top_n(4)排序的数据:

fifa2_Dataset

1 个答案:

答案 0 :(得分:0)

这可能不是最优雅的解决方案,但希望它能起作用:)

# create dummy data
data_test = data.frame(grp = sample(c("def", "mid", "goal", "front"), 30, replace = T), growth = rnorm(30, 100,10), stringsAsFactors = F)

# create referencetable to give the number of players needed per grp
desired_n = data.frame(grp = c("def", "mid", "goal", "front"), top_n_desired = c(4,4,1,2), stringsAsFactors = F)
# > desired_n
# grp top_n_desired
# 1   def             4
# 2   mid             4
# 3  goal             1
# 4 front             2

# group and arrange, than look up the desired amount of players in the referencetable and select them.
data_test %>% group_by(grp) %>% arrange(desc(growth)) %>% 
  slice(1:desired_n$top_n_desired[which(first(grp) == desired_n$grp)]) %>% 
  arrange(grp)

# A bit more readable, but you have to create an additional column in your dataframe

# create additional column with desired amount for the position written in grp of each player
data_test = merge(data_test, desired_n, by = "grp", all.x = T
                  )
data_test %>% group_by(grp) %>% arrange(desc(growth)) %>% 
  slice(1:first(top_n_desired)) %>% 
  arrange(grp)