Question

我有一个如下所示的数据框，其数据值约为130k。

Eng_RPM Veh_Spd
340       56
450       65       
670       0
800       0
890       0
870       0
...       ..
800       0
790       0
940       0
...       ...
1490      67 
1540      78
1880      81

我需要另一个名为Idling Count的变量，当它在Eng_RMP＆gt;中找到值时，会增加该值。 = 400且Veh_Spd == 0，条件是计数器必须在满足条件的数据点960数据点之后启动，上述条件也不应适用于前960个数据点如下所示

预期输出

Eng_RPM Veh_Spd  Idling_Count
340       56       0
450       65       0
670       0        0
...       ...      0 (Upto first 960 values)  
600       0        0(The Idling time starts but counter should wait for another 960 values to increment the counter value)
...       ...      0
800       0        1(This is the 961st Values after start of Idling time i.e Eng_RPM>400 and Veh_Spd==0)
890       0        2
870       0        3  
...       ..       ..
800       1        0 
790       2        0
940       3        0
450       0        0(Data point which satisfies the condition but counter should not increment for another 960 values)
1490      0        4(961st Value from the above data point)
1540      0        5
1880      81       0
....      ...     ... (This cycle should continue for rest of the data points)

Answer 1

你可以通过像这样的循环来做到这一点

创建样本数据并清空列Indling_Cnt

End_RMP <- round(runif(1800,340,1880),0)
Veh_Spd <- round(runif(1800,0,2),0)
dta <- data.frame(End_RMP,Veh_Spd)
dta$Indling_Cnt <- rep(0,1800)

对于Indling_Cnt中的计数，你可以使用很少的if循环，这可能不是最有效的方法，但它应该有效。有更好，更复杂的解决方案。例如，使用其他答案中提到的data.table包。

for(i in 2:dim(dta)[1]){

  n <- which(dta$End_RMP[-(1:960)]>=400&dta$Veh_Spd[-(1:960)]==0)[1]+960+960
  if(i>=n){
    if(dta$End_RMP[i]>=400&dta$Veh_Spd[i]==0){
      dta$Indling_Cnt[i] <- dta$Indling_Cnt[i-1]+1
    }else{
      dta$Indling_Cnt[i] <- dta$Indling_Cnt[i-1]
    }
  }  
}

Answer 2

以下是data.table的处理方式（不使用已知在R中速度较慢的for。）

library(data.table)
setDT(df)
# create a serial number for observation
df[, serial := seq_len(nrow(df))] 
# find series of consective observations matching the condition
# then create internal serial id within each series
df[Eng_RPM > 400 & Veh_Spd == 0,  group_serial:= seq_len(.N),
   by = cumsum((serial - shift(serial, type = "lag", fill = 1)) != 1)  ]
df[is.na(group_serial), group_serial := 0]
# identify observations with group_serial larger than 960, add id
df[group_serial > 960,  Idling_Count := seq_len(.N)]
df[is.na(Idling_Count),  Idling_Count := 0]

在列中保留一定数量的值后执行条件

2 个答案: