我希望基于来自多个变量的各种参数来压缩数据框,但我不确定如何以最简单的方式实现它。我认为这将需要某种个性化功能,但是我在编写功能方面经验不足。
基本上,我的数据框目前看起来像这样:
chainID teamID statID startType endType
1 Team A Effective Pass TO TO
1 Team A Effective Pass TO TO
1 Team A Effective Pass TO TO
1 Team A Effective Pass TO TO
1 Team A Ineffective Pass TO TO
2 Team B Effective Pass TO SH
2 Team B Entry TO SH
2 Team B Effective Pass TO SH
2 Team B Shot TO SH
3 Team A Effective Pass ST TO
3 Team A Entry ST TO
3 Team A Ineffective Pass ST TO
4 Team B Effective Pass TO ST
4 Team B Effective Pass TO ST
4 Team B Ineffective Pass TO ST
5 Team A Effective Pass TO SH
5 Team A Entry TO SH
5 Team A Goal TO SH
6 Team B Effective Pass CB TO
6 Team B Effective Pass CB TO
6 Team B Ineffective Pass CB TO
7 Team A Effective Pass TO ST
7 Team A Ineffective Pass TO ST
我想做的是,每当Entry
的{{1}}列中出现statID
一词时,我想保留该行和该{{ 1}},同时删除该特定chainID
的所有其他行(请参见chainID 2和5)。另外,我还需要的是,如果chainID在statID中包含Entry,但是该特定chainID的最后一行未以目标或击球结尾,那么我希望下一个chainID保留在数据集中,如我的示例所示使用chainID 3和4。然后该函数继续像开始时那样按每个chainID查找条目出现的次数。
例如
chainID
答案 0 :(得分:1)
答案分为两个功能。第一个功能select_rows
,根据"Entry"
的存在从每个组中选择行。第二个功能select_groups
找出未以"Goal"
或"Shot"
结尾的组。
library(dplyr)
select_rows <- function(anyEntry, statID) {
#If anyEntry value is not 0
if(anyEntry[1L]) {
#If the last value is either "Goal" or "Shot" select "Entry" row and last row
#else select all the rows from "Entry" to last row.
if(last(statID) %in% c("Goal", "Shot")) c(anyEntry[1L], length(anyEntry))
else anyEntry[1L] : length(anyEntry)
} else 0
}
select_groups <- function(anyEntry, statID) {
anyEntry[1L] & !last(statID) %in% c("Goal", "Shot")
}
我们创建anyEntry
列,该列的行号在第一个"Entry"
值所在的组中,否则为0。我们分别应用select_rows
和select_groups
函数并绑定列。
df1 <- df %>%
group_by(chainID) %>%
mutate(anyEntry = which.max(statID == "Entry") * any(statID == "Entry"))
Ids <- df1 %>%
summarise(newEntry = select_groups(anyEntry, statID)) %>%
filter(newEntry) %>% pull(chainID)
df1 %>%
slice(select_rows(anyEntry, statID)) %>%
bind_rows(df %>% filter(chainID %in% (Ids + 1))) %>%
select(-anyEntry) %>%
arrange(chainID)
# chainID teamID statID startType endType
# <int> <fct> <fct> <fct> <fct>
#1 2 TeamB Entry TO SH
#2 2 TeamB Shot TO SH
#3 3 TeamA Entry ST TO
#4 3 TeamA IneffectivePass ST TO
#5 4 TeamB EffectivePass TO ST
#6 4 TeamB EffectivePass TO ST
#7 4 TeamB IneffectivePass TO ST
#8 5 TeamB Entry TO SH
#9 5 TeamB Goal TO SH