R

时间:2016-04-10 10:39:22

标签: r

我的数据看起来像这样

> data
         Date Dummy
1  2020-01-01     1
2  2020-01-02     0
3  2020-01-03     0
4  2020-01-04     0
5  2020-01-05     1
6  2020-01-06     1
7  2020-01-07     1
8  2020-01-08     0
9  2020-01-09     1
10 2020-01-10     1
11 2020-01-11     0

我想添加一个列,该列将在dummies中索引连续的1,以便我的最终数据看起来像这样。

> data
         Date Dummy Dummy_Modified
1  2020-01-01     1              1
2  2020-01-02     0              0
3  2020-01-03     0              0
4  2020-01-04     0              0
5  2020-01-05     1              1
6  2020-01-06     1              2
7  2020-01-07     1              3
8  2020-01-08     0              0
9  2020-01-09     1              1
10 2020-01-10     1              2
11 2020-01-11     0              0

我如何在R

中实现这一目标

2 个答案:

答案 0 :(得分:4)

这应该可以解决问题

df <- data.frame(dummy = c(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0))
df$dummy_mod <- sequence(rle(df$dummy)$lengths) * df$dummy
df
#    dummy dummy_mod
# 1      1         1
# 2      0         0
# 3      0         0
# 4      0         0
# 5      1         1
# 6      1         2
# 7      1         3
# 8      0         0
# 9      1         1
# 10     1         2
# 11     0         0

编辑:dplyr

library(dplyr)
df <- data.frame(dummy = c(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0))
df %>% mutate(dummy_mod = sequence(rle(dummy)[["lengths"]]) * dummy)

答案 1 :(得分:2)

使用data.table,我们可以使用rleid功能。转换&#39; data.frame&#39;为了按照rleid(Dummy分组,我们创建了一个新列(Dummy_Modified),通过分配(:=)&#39; Dummy&的输出来创建一个新列(seq_len(.N)) #39;乘以行(library(data.table) setDT(data)[, Dummy_Modified := Dummy * seq_len(.N), by = rleid(Dummy)] data # Date Dummy Dummy_Modified # 1: 2020-01-01 1 1 # 2: 2020-01-02 0 0 # 3: 2020-01-03 0 0 # 4: 2020-01-04 0 0 # 5: 2020-01-05 1 1 # 6: 2020-01-06 1 2 # 7: 2020-01-07 1 3 # 8: 2020-01-08 0 0 # 9: 2020-01-09 1 1 #10: 2020-01-10 1 2 #11: 2020-01-11 0 0 )的序列,以便在&#39; Dummy&#39;将在输出中保持为0。

dplyr

使用lag,我们可以使用cumsum来检查“假人”中的相关元素是否正确。是否相同,获取用于创建分组列的逻辑索引的row_number()(&#39; gr&#39;),然后我们使用与上面相同的方法来获取&#39; Dummy_Modified&#39; ;。 dplyr中的library(dplyr) data %>% group_by(gr = cumsum(Dummy!= dplyr::lag(Dummy, default= Dummy[1L]))) %>% mutate(Dummy_Modified = Dummy *row_number()) %>% ungroup() %>% select(-gr) # Date Dummy Dummy_Modified # (chr) (int) (int) #1 2020-01-01 1 1 #2 2020-01-02 0 0 #3 2020-01-03 0 0 #4 2020-01-04 0 0 #5 2020-01-05 1 1 #6 2020-01-06 1 2 #7 2020-01-07 1 3 #8 2020-01-08 0 0 #9 2020-01-09 1 1 #10 2020-01-10 1 2 #11 2020-01-11 0 0 给出了行序列。

String grantType = "password";
Call<SignIn> signInCall = retrofitApi.signIn(email, password, grantType);
signInCall.enqueue(callback);