Question

我的数据如下：

# A tibble: 100 x 2
   positives negatives
       <dbl>     <dbl>
 1         1         0
 2         1         0
 3         0        -1
 4         0         0
 5         0         0
 6         1         0
 7         0         0
 8         0         0
 9         0        -1
10         0        -1
# ... with 90 more rows

我想创建两个新列，数据最终将看起来像这样：

# A tibble: 100 x 2
   positives negatives    newcol1     newcol2

 1         1         0       1           0
 2         1         0       0           0
 3         0        -1       0          -1
 4         0         0       0           0
 5         0         0       0           0
 6         1         0       1           0
 7         0         0       0           0
 8         0         0       0           0 
 9         0        -1       0          -1
10         0        -1       0           0
# ... with 90 more rows

newcol1在1列中首次出现positives的地方-此列中的所有后续行将是0，直到出现{{1} } -1列中。然后，negatives列将继续使用newcol2，直到-1列中的1出现“新的优先”。

使用底部20行的另一个示例：

positives

数据：

# A tibble: 20 x 2
   positives negatives    newcol1      newcol2
       <dbl>     <dbl>
 1         0        -1       0            -1
 2         0         0       0             0
 3         0        -1       0             0  # a 0 since we have not had a 1 in "positives"
 4         1         0       1             0  # now we have a 1 so put a 1 in newcol1
 5         1         0       0             0  # 0 here since this is the 2nd occurrence of a 1 in this column
 6         0        -1       0             -1 # we add -1 here since its the first occurrence of a -1 in the negatives column after we encountered a 1 in the positives column
 7         0         0       0              0
 8         0         0       0              0 
 9         0         0       0              0 
10         1         0       1              0 # change back to the positives/newcol1 since this is the first 1 occurrence in the positives column after we encountered a -1 in the negatives column
11         1         0       0              0 # there was a 1 previously in the positives column so we ignore this 1 in the positives column (until we encounter a -1 in the negatives column)
12         0        -1       0             -1              
13         0        -1       0              0
14         0        -1       0              0
15         0         0       0              0 
16         0         0       0              0
17         0        -1       0              0 
18         0        -1       0              0
19         0         0       0              0 
20         0         0       0              0 # no other 1 in the positives column so we finish on a -1 in the newcol2 column.

Answer 1

我们可以使用rleid创建一个分组变量，然后根据'positives'中all的值为1来创建二进制文件，而row_number为1，并且类似于'newcol2 '

library(dplyr)
library(data.table)
df1 %>% 
    group_by(grp = rleid(positives)) %>% 
    mutate(newcol1 = +(all(positives == 1) * row_number() == 1)) %>%
    ungroup %>%
    group_by(grp = rleid(negatives)) %>%
    mutate(newcol2 = -1 *(all(negatives == -1) * row_number() == 1)) %>%
    ungroup %>%
    select(-grp)
# A tibble: 100 x 4
#   positives negatives newcol1 newcol2
#       <dbl>     <dbl>   <int>   <dbl>
# 1         1         0       1       0
# 2         1         0       0       0
# 3         0        -1       0      -1
# 4         0         0       0       0
# 5         0         0       0       0
# 6         1         0       1       0
# 7         0         0       0       0
# 8         0         0       0       0
# 9         0        -1       0      -1
#10         0        -1       0       0
# … with 90 more rows

或者如@H 1所述，rleid分组只能应用一次

df1 %>% 
    group_by(grp = rleid(positives + negatives)) %>% 
    mutate(newcol1 = +(all(positives == 1) * row_number() == 1), 
           newcol2 = -1 *(all(negatives == -1) * row_number() == 1))

根据ifelse / case_when首次出现的条件创建一个新列

1 个答案: