Merging dates by id for two specific variables and create new variables

时间:2016-07-11 21:38:19

标签: r date merge

Hello I am struggling to create a data frame from a current data frame. There are two types of events and two types of dates for the event columns. Here is my data.

id <- c(1,1,2,2,3,3)
type1 <- c("EB","EB","EB","IK","IK","EB")
date1 <- c("2011/08/31", "2011/08/31", "2012/01/15", "2012/01/20", "2012/03/10", "2012/03/24")
type2 <- c("missed", "missed", "kept", "missed", "kept", "missed")
date2 <- c("2011/03/17", "2011/03/18", "2011/03/30", "2012/04/25", "2012/05/01", "2012/05/10")

data1 <- data.frame(id, type1, date1, type2, date2)

 id type1      date1   type2      date2
 1    EB  2011/08/31  missed  2011/03/17
 1    EB  2011/08/31  missed  2011/03/18
 2    EB  2012/01/15   kept   2011/03/30
 2    IK  2012/01/20  missed  2012/04/25
 3    IK  2012/03/10   kept   2012/05/01
 3    EB  2012/03/24  missed  2012/05/10

First of all, I would like to merge these two dates columns into one ordered date column for each id. Second, I need a column called "event.type" in which missed/kept/EB/IK categories will be stored. Third, an "event.number" column is needed in which an event order numbers will be given for each id. Lastly, I need a column called "missed/kept counter" that counts the number of kept/missed for each id.

The data should look like below.

id <- c(1,1,1,2,2,2,2,3,3,3,3)
date <- c("2011/03/17", "2011/03/18", "2011/08/31", "2011/03/30", "2012/01/15", "2012/01/20","2012/04/25","2012/03/10","2012/03/24","2012/05/01","2012/05/10")
event.type <- c("missed", "missed", "EB", "kept", "EB", "IK", "missed", "IK", "EB", "kept", "missed")
event.number <- c(1,2,3,1,2,3,4,1,2,3,4) 
missed.kept.counter <- c(1,2,0,1,0,0,1,0,0,1,2)

data2 <- data.frame(id,date,event.type,event.number,missed.kept.counter)

> data2
id       date   event.type    event.number      missed.kept.counter
1   2011/03/17     missed            1                   1
1   2011/03/18     missed            2                   2
1   2011/08/31         EB            3                   0
2   2011/03/30       kept            1                   1
2   2012/01/15         EB            2                   0
2   2012/01/20         IK            3                   0
2   2012/04/25     missed            4                   1
3   2012/03/10         IK            1                   0
3   2012/03/24         EB            2                   0
3   2012/05/01       kept            3                   1
3   2012/05/10     missed            4                   2

I am looking forward to any help to get me out from this problem.

Thanks for your help in advance.

Best.

3 个答案:

答案 0 :(得分:2)

Try with data.table. Starting with your data combined row-wise isntead of column-wise, this gets your data already close to what you want:

library( data.table )
data1 <- data.table( id = rep( id, 2 ), 
                     type = c( type1, type2 ), 
                     date = c( date1, date2 ) )

Then sort that by id and date respectively:

setorder( data1, id, date )

To get the event.number. I've edited this to be tidier, thanks to seeing @bgoldst's solution :)

data1[ , event.number := seq_len( .N ), by = id ]

Your last step, which seems to be a cumulative count of both "missed" and "kept" within each id:

data1[ type == "missed" | type == "kept"
, missed.kept.number := cumsum( type == "missed" | type == "kept" ), by = id ]

This will give you the numbers you want, with NAs elsewhere in the missed.kept.number column. If you specifically want zeros, add this line before the "last step" above:

data1[ , missed.kept.number := 0L ] 

答案 1 :(得分:1)

Here's a way to do the first 3 tasks. I'm not sure what you want in the last task.

#Create data
id <- c(1,1,2,2,3,3)
type1 <- c("EB","EB","EB","IK","IK","EB")
date1 <- c("2011/08/31", "2011/08/31", "2012/01/15", "2012/01/20", "2012/03/10", "2012/03/24")
type2 <- c("missed", "missed", "kept", "missed", "kept", "missed")
date2 <- c("2011/03/17", "2011/03/18", "2011/03/30", "2012/04/25", "2012/05/01", "2012/05/10")

#Create data frames
data1 <- data.frame(id, date=date1, event.type=type1)
data2 <- data.frame(id, date=date2, event.type=type2)

#Merge and order data
df <- merge(data1, data2, all=T)
df <- df[!duplicated(df),]
df <- df[order(df$id, df$date),]

#Create event.number column
library(dplyr)
df$event.number <- (df %>% group_by(id) %>% mutate(counter = row_number(id)))$counter

答案 2 :(得分:1)

library(data.table);

## coerce frame to data.table and convert factors to character vectors
setDT(data1);
data1[j=names(data1)[-1L]:=lapply(.SD[,-1L,with=F],as.character)];

## transform data1 into data2, governed by ordered unique dates
data2 <- data1[by=id,j={
    d <- c(date1,date2);
    u <- which(!duplicated(d));
    u <- u[order(d[u])];
    .(date=d[u],event.type=c(type1,type2)[u]);
}];

## derive additional columns
data2[by=id,j=event.number:=seq_len(.N)];
data2[by=id,j=missed.kept.counter:={
    cntl <- event.type%in%c('missed','kept');
    ifelse(cntl,cumsum(cntl),0L);
}];

## result
data2;
##     id       date event.type event.number missed.kept.counter
##  1:  1 2011/03/17     missed            1                   1
##  2:  1 2011/03/18     missed            2                   2
##  3:  1 2011/08/31         EB            3                   0
##  4:  2 2011/03/30       kept            1                   1
##  5:  2 2012/01/15         EB            2                   0
##  6:  2 2012/01/20         IK            3                   0
##  7:  2 2012/04/25     missed            4                   2
##  8:  3 2012/03/10         IK            1                   0
##  9:  3 2012/03/24         EB            2                   0
## 10:  3 2012/05/01       kept            3                   1
## 11:  3 2012/05/10     missed            4                   2