我正在尝试清除数据,以便将包含“ gamecentre-playbyplay-event”的一行正下方的每一行都标记为目标,而将包含“ gamecentre-playbyplay-event”的每一行都直接标记为“目标” “行被标记为主要辅助,并且在“主要辅助”行正下方包含“游戏中心-玩法-玩法-事件”的每一行都被标记为辅助。
数据如下:
mydata
# A tibble: 15 x 1
value
<chr>
1 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-bat gamecentre-playby"
2 "<a href=\"/players/14695\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
3 "<a href=\"/players/16639\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
4 "<a href=\"/players/17027\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
5 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby"
6 "<a href=\"/players/17453\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
7 "<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
8 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby"
9 "<a href=\"/players/18061\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
10 "<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
11 "<a href=\"/players/17522\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
12 "<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby"
13 "<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
14 "<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
15 "<a href=\"/players/14757\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
这里还是有一些问题。
NA
。NA
。为此,我尝试使用dplyr::lag()
,但是在没有主要或次要辅助的情况下,我想要NA
会令人困惑。
这是我到目前为止所拥有的基础:
goals <- mydata %>%
filter(dplyr::lag(str_detect(value, "gamecentre-playbyplay-event team-border"), 1))
goals
# A tibble: 4 x 1
value
<chr>
1 "<a href=\"/players/14695\" class=\"gamecentre__link gamecentre__link--goal\" data-re
2 "<a href=\"/players/17453\" class=\"gamecentre__link gamecentre__link--goal\" data-re
3 "<a href=\"/players/18061\" class=\"gamecentre__link gamecentre__link--goal\" data-re
4 "<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re
这就是我希望我的数据在所有这些末尾显示的样子。我认为使用dplyr::lag()
是可行的方法,但我不确定。
# A tibble: 4 x 3
goal primary_assist secondary_assist
<chr> <chr> <chr>
1 "<a href=\"/players/14695\" class=\"gam~ "<a href=\"/players/16639\" class=\"gamecent~ "<a href=\"/players/17027\" class=\"gamecentr~
2 "<a href=\"/players/17453\" class=\"gam~ "<a href=\"/players/14639\" class=\"gamecent~ NA
3 "<a href=\"/players/18061\" class=\"gam~ "<a href=\"/players/14752\" class=\"gamecent~ "<a href=\"/players/17522\" class=\"gamecentr~
4 "<a href=\"/players/14752\" class=\"gam~ "<a href=\"/players/14639\" class=\"gamecent~ "<a href=\"/players/14757\" class=\"gamecentr~
有什么想法吗?
dput:
mydata <- structure(list(value = c("<div class=\"gamecentre-playbyplay-event team-border--lhjmq-bat gamecentre-playby",
"<a href=\"/players/14695\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/16639\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/17027\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby",
"<a href=\"/players/17453\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby",
"<a href=\"/players/18061\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/17522\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<div class=\"gamecentre-playbyplay-event team-border--lhjmq-mon gamecentre-playby",
"<a href=\"/players/14752\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/14639\" class=\"gamecentre__link gamecentre__link--goal\" data-re",
"<a href=\"/players/14757\" class=\"gamecentre__link gamecentre__link--goal\" data-re"
)), .Names = "value", class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -15L))
答案 0 :(得分:4)
一种选择是创建一个分组变量,然后创建spread
library(tidyverse)
mydata %>%
#create a group based on the occurrence of 'playby'
group_by(grp = cumsum(str_detect(value, 'playby'))) %>%
# filter out the first row of the group that have playby
filter(row_number() > 1) %>%
# create a new category column
mutate(categ = c("goal", "primary_assist", "secondary_assist")[row_number()]) %>%
# spread from long to wide
spread(categ, value) %>%
# remove the grouping column as part of clean up
ungroup %>%
select(-grp)
# A tibble: 4 x 3
# goal primary_assist secondary_assist
# <chr> <chr> <chr>
#1 "<a href=\"/players/14695\" class=\"g… "<a href=\"/players/16639\" class=\"gamece… "<a href=\"/players/17027\" class=\"gamece…
#2 "<a href=\"/players/17453\" class=\"g… "<a href=\"/players/14639\" class=\"gamece… <NA>
#3 "<a href=\"/players/18061\" class=\"g… "<a href=\"/players/14752\" class=\"gamece… "<a href=\"/players/17522\" class=\"gamece…
#4 "<a href=\"/players/14752\" class=\"g… "<a href=\"/players/14639\" class=\"gamece… "<a href=\"/players/14757\" class=\"gamece…