根据零之间的数据添加新列

时间:2017-07-31 05:50:42

标签: r dataframe

我每秒收集一次电力数据(电力)(样本)。因此,我的data.frame结构如下:

Test <- data.frame(Sample = c(1:20), 
                   Power = c(0,0,0,0,0,50,67,100,92,0,0,0,36,89,36,0,0,0,89,90))

电力输入的数量取决于人在自行车上进行努力并偶尔休息。因此,电源不会以有序的方式出现。由于没有标记来指示努力何时开始和停止,我想包括这个细节。当功率> 1时,可以表征努力。可以基于数据组一起评估每个努力的开始/停止。

我现在希望包含一个新列(标记),用于查找组合在一起并用零分隔的功率数据。例如,我的预期输出是:

Test$Marker <- c("Rest","Rest","Rest","Rest","Rest","Effort 1","Effort 1","Effort 1","Effort 1",
                 "Rest","Rest","Rest","Effort 2","Effort 2","Effort 2","Rest","Rest","Rest",
                 "Effort 3","Effort 3")

不幸的是我的原始数据是&gt; 3000行,所以手动执行这将是乏味的!我怎么能在R?中做到这一点?

3 个答案:

答案 0 :(得分:5)

基础R的选项:

indx1 = with(rle(Test$Power>0),rep(values,lengths))
indx2 = with(rle(Test$Power>0),rep(cumsum(values),lengths))
Test$Effort[indx1] = paste0("Effort",indx2[indx1])
Test$Effort[!indx1]="Rest"

输出:

   Sample Power  Effort
1       1     0    Rest
2       2     0    Rest
3       3     0    Rest
4       4     0    Rest
5       5     0    Rest
6       6    50 Effort1
7       7    67 Effort1
8       8   100 Effort1
9       9    92 Effort1
10     10     0    Rest
11     11     0    Rest
12     12     0    Rest
13     13    36 Effort2
14     14    89 Effort2
15     15    36 Effort2
16     16     0    Rest
17     17     0    Rest
18     18     0    Rest
19     19    89 Effort3
20     20    90 Effort3

3,000行约0.0038秒;)希望这有帮助!

答案 1 :(得分:3)

使用cumsum的替代基础R版本:

mrk <- Test$Power==0
Test$New[!mrk] <- paste("effort", as.numeric(factor(cumsum(mrk)[!mrk])))
Test$New[mrk] <- "rest"

#   Sample Power   Marker      New
#1       1     0     Rest     rest
#2       2     0     Rest     rest
#3       3     0     Rest     rest
#4       4     0     Rest     rest
#5       5     0     Rest     rest
#6       6    50 Effort 1 effort 1
#7       7    67 Effort 1 effort 1
#8       8   100 Effort 1 effort 1
#9       9    92 Effort 1 effort 1
#10     10     0     Rest     rest
#11     11     0     Rest     rest
#12     12     0     Rest     rest
#13     13    36 Effort 2 effort 2
#14     14    89 Effort 2 effort 2
#15     15    36 Effort 2 effort 2
#16     16     0     Rest     rest
#17     17     0     Rest     rest
#18     18     0     Rest     rest
#19     19    89 Effort 3 effort 3
#20     20    90 Effort 3 effort 3

答案 2 :(得分:2)

来自dplyr的{​​{1}}选项:

tidyverse

使用单行library(dplyr) Test <- data.frame(Sample = c(1:20), Power = c(0,0,0,0,0,50,67,100,92,0,0,0,36,89,36,0,0,0,89,90)) Test_df <- Test %>% mutate( Marker = case_when( Power > 0 ~ "Effort", Power == 0 ~"Rest"), rleid = cumsum(Marker != lag(Marker, 1, default = "NA")), Marker = case_when( Marker == "Effort" ~ paste0(Marker, rleid %/% 2), TRUE ~ "Rest"), rleid = NULL ) Test_df #> Sample Power Marker #> 1 1 0 Rest #> 2 2 0 Rest #> 3 3 0 Rest #> 4 4 0 Rest #> 5 5 0 Rest #> 6 6 50 Effort1 #> 7 7 67 Effort1 #> 8 8 100 Effort1 #> 9 9 92 Effort1 #> 10 10 0 Rest #> 11 11 0 Rest #> 12 12 0 Rest #> 13 13 36 Effort2 #> 14 14 89 Effort2 #> 15 15 36 Effort2 #> 16 16 0 Rest #> 17 17 0 Rest #> 18 18 0 Rest #> 19 19 89 Effort3 #> 20 20 90 Effort3 的其他选项:

data.table