for-loop through ID-List & counting Values

时间:2018-03-25 19:31:37

标签: r for-loop if-statement transform

I hope someone can help me with my problem, I know using two for-loops is not very efficient but that was my first solution. I have a data frame (AllPat) with eye-patients (patient-id, date and visit ->'o'perations or 'c'heckups)

#Pat    Date        Visit    
#1,l    2015-03-30    c        
#1,l    2015-06-03    o        
#1,l    2015-07-01    o        
#1,l    2015-07-20    c    
#1,l    2016-03-16    o        
#1,l    2016-04-13    o        
#1,l    2016-05-09    c           
#2,l    2014-12-23    c 
#2,l    2015-01-21    o        
#2,l    2015-03-16    c    
#2,l    2015-11-23    o        

And I want to count the operation-blocks for each patient-id (before and after a checkup)

#Pat    Date        Visit    Block
#1,l    2015-03-30    c        
#1,l    2015-06-03    o        1
#1,l    2015-07-01    o        2
#1,l    2015-07-20    c    
#1,l    2016-03-16    o        1
#1,l    2016-04-13    o        2
#1,l    2016-05-09    c           
#2,l    2014-12-23    c 
#2,l    2015-01-21    o        1
#2,l    2015-03-16    c    
#2,l    2015-11-23    o        1

and that's the current code:

for(i in unique(AllPat$Pat)){
op <- 0
for(j in AllPat$Pat){
  if(i == j) {
    if(AllPat$Visit[AllPat$Pat == j] == "o") {
      AllPat$Block[AllPat$Pat == j] <- op
      op <- op+1
    }
    else op<-0
  }
}
}

my problem is, that the values in $Block only get visible if I sort them by hand in the view of the data frame, maybe someone has a better solution and can help me


UPDATE: my current data frame with the suggested function rleid:

Patient Date    Visit   DiffDate    Block
3,r 16.02.2016  m       0
3,r 16.02.2016  m   0   0
3,r 16.02.2016  m   0   0
3,r 16.02.2016  m   0   0
3,r 20.04.2016  o   64  1
3,r 18.05.2016  o   28  1 <<- should be 2
3,r 15.06.2016  o   28  1 <<- should be 3
3,r 04.07.2016  m   19  0
3,r 27.07.2016  o   23  1
3,r 24.08.2016  o   28  2
3,r 18.10.2016  o   55  3

maybe I should change my difftime function? The current code for counting the blocks is:

n <- nrow(AllPat)
AllPat<- transform(AllPat, Block = ave(1:n, rleid(Patient, Visit, (DiffDate<= 60)), FUN = seq_along) * (Visit== "o"))

and the difference between the dates:

setDT(AllPat)[, DiffDate:= difftime(AllPat$Date, shift(AllPat$Date), units = "days"), by = c("Patient")]

UPDATE

4,l 2015-05-18  m   NA  0
4,l 2015-10-20  o   155 1 
4,l 2016-05-31  o   224 2 <<-1
4,l 2016-07-26  o   56  1

2 个答案:

答案 0 :(得分:1)

rleid in the data.table package can help here. We have used 0 for the checkup blocks.

library(data.table)
AllPatDT <- data.table(AllPat)
AllPatDT[, Block := ave(.I, rleid(X.Pat, Visit), FUN = seq_along) * (Visit == "o")]

giving:

> AllPatDT
    X.Pat       Date Visit Block
 1:  #1,l 2015-03-30     c     0
 2:  #1,l 2015-06-03     o     1
 3:  #1,l 2015-07-01     o     2
 4:  #1,l 2015-07-20     c     0
 5:  #1,l 2016-03-16     o     1
 6:  #1,l 2016-04-13     o     2
 7:  #1,l 2016-05-09     c     0
 8:  #2,l 2014-12-23     c     0
 9:  #2,l 2015-01-21     o     1
10:  #2,l 2015-03-16     c     0
11:  #2,l 2015-11-23     o     1

If you prefer a straight data.frame then using only rleid from the data.table package we have:

library(data.table)

n <- nrow(AllPat)
transform(AllPat, Block = ave(1:n, rleid(X.Pat, Visit), FUN = seq_along) * (Visit == "o"))

Note

We have used the following as AllPat:

Lines <- "#Pat    Date        Visit    
#1,l    2015-03-30    c        
#1,l    2015-06-03    o        
#1,l    2015-07-01    o        
#1,l    2015-07-20    c    
#1,l    2016-03-16    o        
#1,l    2016-04-13    o        
#1,l    2016-05-09    c           
#2,l    2014-12-23    c 
#2,l    2015-01-21    o        
#2,l    2015-03-16    c    
#2,l    2015-11-23    o"
AllPat <- read.table(text = Lines, header = TRUE, comment.char = "", as.is = TRUE)

答案 1 :(得分:0)

I did a search "[r] sequence within groups" and found an answer that I was able to adapt with a trick I have (in all honesty probably learned from G.Grothendieck) for making groups. This is a link to an answer from Martin Morgan (a certified R guru)

generate sequence (and starting over in case of a recurrence) and add new column with highest number per sequence, within group, in R

I added that to my trick that forms groups at points where a condition occurs:

> dat$seq <- cumsum(dat$Visit=="c")
> dat
   Pat       Date Visit seq
1  1,l 2015-03-30     c   1
2  1,l 2015-06-03     o   1
3  1,l 2015-07-01     o   1
4  1,l 2015-07-20     c   2
5  1,l 2016-03-16     o   2
6  1,l 2016-04-13     o   2
7  1,l 2016-05-09     c   3
8  2,l 2014-12-23     c   4
9  2,l 2015-01-21     o   4
10 2,l 2015-03-16     c   5
11 2,l 2015-11-23     o   5
> rle <- rle(paste(dat$Pat, dat$seq, sep = "\r"))
> dat$Seq <- unlist(lapply(rle$length, seq_len))
> dat
   Pat       Date Visit seq Seq
1  1,l 2015-03-30     c   1   1
2  1,l 2015-06-03     o   1   2
3  1,l 2015-07-01     o   1   3
4  1,l 2015-07-20     c   2   1
5  1,l 2016-03-16     o   2   2
6  1,l 2016-04-13     o   2   3
7  1,l 2016-05-09     c   3   1
8  2,l 2014-12-23     c   4   1
9  2,l 2015-01-21     o   4   2
10 2,l 2015-03-16     c   5   1
11 2,l 2015-11-23     o   5   2
> rle <- rle(paste(dat$Pat, dat$seq, sep = "\r"))
> dat$Seq <- dat$Seq -1
> dat$Seq[dat$Seq==0] <- " "
> dat
   Pat       Date Visit seq Seq
1  1,l 2015-03-30     c   1    
2  1,l 2015-06-03     o   1   1
3  1,l 2015-07-01     o   1   2
4  1,l 2015-07-20     c   2    
5  1,l 2016-03-16     o   2   1
6  1,l 2016-04-13     o   2   2
7  1,l 2016-05-09     c   3    
8  2,l 2014-12-23     c   4    
9  2,l 2015-01-21     o   4   1
10 2,l 2015-03-16     c   5    
11 2,l 2015-11-23     o   5   1