两列的条件子集 - r

时间:2014-01-22 15:50:54

标签: arrays r time-series sampling subset

我有几个问题,我有兴趣解决。我想通过一个值(例如:

)对数组中的conc列进行采样和存储
newdata <- data[ which(data$conc > 8), ]

但是,我想用它保存相关的日期时间戳。最后在另一个数组中,当conc值超过8.00然后降到8.00以下时,我想存储这一集的持续时间。因此,例如,21:30将记录为15分钟,另一个时间将记录在00:15和03:00之间,导致存储值为165分钟。

    datetime            conc
    20/08/2012 21:00    7.29                                                                                    
    20/08/2012 21:15    7.35                                                                                    
    20/08/2012 21:30    35.23                                                                                   
    20/08/2012 21:45    7.44                                                                                    
    20/08/2012 22:00    13.30                                                                                   
    20/08/2012 22:15    7.60                                                                                    
    20/08/2012 22:30    7.65                                                                                    
    20/08/2012 22:45    7.70                                                                                    
    20/08/2012 23:00    7.83                                                                                    
    20/08/2012 23:15    8.07                                                                                    
    20/08/2012 23:30    8.30                                                                                    
    20/08/2012 23:45    22.44                                                                                   
    21/08/2012 00:00    7.81                                                                                    
    21/08/2012 00:15    10.67                                                                                   
    21/08/2012 00:30    11.07                                                                                   
    21/08/2012 00:45    8.29                                                                                    
    21/08/2012 01:00    8.17                                                                                    
    21/08/2012 01:15    8.29                                                                                    
    21/08/2012 01:30    8.26                                                                                    
    21/08/2012 01:45    8.93                                                                                    
    21/08/2012 02:00    9.74                                                                                    
    21/08/2012 02:15    9.69                                                                                    
    21/08/2012 02:30    9.15                                                                                    
    21/08/2012 02:45    9.52                                                                                    
    21/08/2012 03:00    9.10
    21/08/2012 03:15    7.10

1 个答案:

答案 0 :(得分:0)

也许一种形式是在数据中添加两个列,一个表示浓度高于8,另一个表示在返回8之前的累计时间。

 #generating data
 data <- read.table(text="datetime conc
'20/08/2012 21:00' 7.29
'20/08/2012 21:15' 7.35
'20/08/2012 21:30' 35.23
'20/08/2012 21:45' 7.44
'20/08/2012 22:00' 13.30
'20/08/2012 22:15' 7.60
'20/08/2012 22:30' 7.65
'20/08/2012 22:45' 7.70
'20/08/2012 23:00' 7.83
'20/08/2012 23:15' 8.07
'20/08/2012 23:30' 8.30
'20/08/2012 23:45' 22.44
'21/08/2012 00:00' 7.81
'21/08/2012 00:15' 10.67
'21/08/2012 00:30' 11.07
'21/08/2012 00:45' 8.29
'21/08/2012 01:00' 8.17
'21/08/2012 01:15' 8.29
'21/08/2012 01:30' 8.26
'21/08/2012 01:45' 8.93
'21/08/2012 02:00' 9.74
'21/08/2012 02:15' 9.69
'21/08/2012 02:30' 9.15
'21/08/2012 02:45' 9.52
'21/08/2012 03:00' 9.10
'21/08/2012 03:15' 7.10", sep=" ", header=TRUE, stringsAsFactors=FALSE)

#converting to date

data$datetime<-as.POSIXct(data$datetime, format="%d/%m/%Y %H:%M")

#creating stamps
data$stamp <- NA
data$stamp[which(data$conc<8)] <- "less.than.8"
data$stamp[which(data$conc>8)] <- "greater.than.8"

#calculating cumulative durationg in the episodes of sequencies of conc>8

for (i in 1:nrow(data)){
  if(data$stamp[i] =="less.than.8"){
    data$cum.duration[i] <- 0}
  if(data$stamp[i] =="greater.than.8"){
    data$cum.duration[i] <- (data$datetime[i]-data$datetime[i-1])+data$cum.duration[i-1]}
}

这将产生下表,然后你可以用它做任何你想做的事情:

              datetime  conc          stamp cum.duration
1  2012-08-20 21:00:00  7.29    less.than.8            0
2  2012-08-20 21:15:00  7.35    less.than.8            0
3  2012-08-20 21:30:00 35.23 greater.than.8           15
4  2012-08-20 21:45:00  7.44    less.than.8            0
5  2012-08-20 22:00:00 13.30 greater.than.8           15
6  2012-08-20 22:15:00  7.60    less.than.8            0
7  2012-08-20 22:30:00  7.65    less.than.8            0
8  2012-08-20 22:45:00  7.70    less.than.8            0
9  2012-08-20 23:00:00  7.83    less.than.8            0
10 2012-08-20 23:15:00  8.07 greater.than.8           15
11 2012-08-20 23:30:00  8.30 greater.than.8           30
12 2012-08-20 23:45:00 22.44 greater.than.8           45
13 2012-08-21 00:00:00  7.81    less.than.8            0
14 2012-08-21 00:15:00 10.67 greater.than.8           15
15 2012-08-21 00:30:00 11.07 greater.than.8           30
16 2012-08-21 00:45:00  8.29 greater.than.8           45
17 2012-08-21 01:00:00  8.17 greater.than.8           60
18 2012-08-21 01:15:00  8.29 greater.than.8           75
19 2012-08-21 01:30:00  8.26 greater.than.8           90
20 2012-08-21 01:45:00  8.93 greater.than.8          105
21 2012-08-21 02:00:00  9.74 greater.than.8          120
22 2012-08-21 02:15:00  9.69 greater.than.8          135
23 2012-08-21 02:30:00  9.15 greater.than.8          150
24 2012-08-21 02:45:00  9.52 greater.than.8          165
25 2012-08-21 03:00:00  9.10 greater.than.8          180
26 2012-08-21 03:15:00  7.10    less.than.8            0

要仅选择结束剧集,您可以写:

 lines <- which(data$conc>8)
 lines <- lines[(lines[2:length(lines)] - lines[1:(length(lines)-1)])>1]
 data[lines,]

哪个会给你:

          datetime  conc          stamp cum.duration
3  2012-08-20 21:30:00 35.23 greater.than.8           15
5  2012-08-20 22:00:00 13.30 greater.than.8           15
12 2012-08-20 23:45:00 22.44 greater.than.8           45
25 2012-08-21 03:00:00  9.10 greater.than.8          180