我的数据如下:
# A tibble: 100 x 2
positives negatives
<dbl> <dbl>
1 1 0
2 1 0
3 0 -1
4 0 0
5 0 0
6 1 0
7 0 0
8 0 0
9 0 -1
10 0 -1
# ... with 90 more rows
我想创建两个新列,数据最终将看起来像这样:
# A tibble: 100 x 2
positives negatives newcol1 newcol2
1 1 0 1 0
2 1 0 0 0
3 0 -1 0 -1
4 0 0 0 0
5 0 0 0 0
6 1 0 1 0
7 0 0 0 0
8 0 0 0 0
9 0 -1 0 -1
10 0 -1 0 0
# ... with 90 more rows
newcol1
在1
列中首次出现positives
的地方-此列中的所有后续行将是0
,直到出现{{1} } -1
列中。然后,negatives
列将继续使用newcol2
,直到-1
列中的1
出现“新的优先”。
使用底部20行的另一个示例:
positives
数据:
# A tibble: 20 x 2
positives negatives newcol1 newcol2
<dbl> <dbl>
1 0 -1 0 -1
2 0 0 0 0
3 0 -1 0 0 # a 0 since we have not had a 1 in "positives"
4 1 0 1 0 # now we have a 1 so put a 1 in newcol1
5 1 0 0 0 # 0 here since this is the 2nd occurrence of a 1 in this column
6 0 -1 0 -1 # we add -1 here since its the first occurrence of a -1 in the negatives column after we encountered a 1 in the positives column
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 1 0 1 0 # change back to the positives/newcol1 since this is the first 1 occurrence in the positives column after we encountered a -1 in the negatives column
11 1 0 0 0 # there was a 1 previously in the positives column so we ignore this 1 in the positives column (until we encounter a -1 in the negatives column)
12 0 -1 0 -1
13 0 -1 0 0
14 0 -1 0 0
15 0 0 0 0
16 0 0 0 0
17 0 -1 0 0
18 0 -1 0 0
19 0 0 0 0
20 0 0 0 0 # no other 1 in the positives column so we finish on a -1 in the newcol2 column.
答案 0 :(得分:2)
我们可以使用rleid
创建一个分组变量,然后根据'positives'中all
的值为1来创建二进制文件,而row_number
为1,并且类似于'newcol2 '
library(dplyr)
library(data.table)
df1 %>%
group_by(grp = rleid(positives)) %>%
mutate(newcol1 = +(all(positives == 1) * row_number() == 1)) %>%
ungroup %>%
group_by(grp = rleid(negatives)) %>%
mutate(newcol2 = -1 *(all(negatives == -1) * row_number() == 1)) %>%
ungroup %>%
select(-grp)
# A tibble: 100 x 4
# positives negatives newcol1 newcol2
# <dbl> <dbl> <int> <dbl>
# 1 1 0 1 0
# 2 1 0 0 0
# 3 0 -1 0 -1
# 4 0 0 0 0
# 5 0 0 0 0
# 6 1 0 1 0
# 7 0 0 0 0
# 8 0 0 0 0
# 9 0 -1 0 -1
#10 0 -1 0 0
# … with 90 more rows
或者如@H 1所述,rleid
分组只能应用一次
df1 %>%
group_by(grp = rleid(positives + negatives)) %>%
mutate(newcol1 = +(all(positives == 1) * row_number() == 1),
newcol2 = -1 *(all(negatives == -1) * row_number() == 1))