Hello I am struggling to create a data frame from a current data frame. There are two types of events and two types of dates for the event columns. Here is my data.
id <- c(1,1,2,2,3,3)
type1 <- c("EB","EB","EB","IK","IK","EB")
date1 <- c("2011/08/31", "2011/08/31", "2012/01/15", "2012/01/20", "2012/03/10", "2012/03/24")
type2 <- c("missed", "missed", "kept", "missed", "kept", "missed")
date2 <- c("2011/03/17", "2011/03/18", "2011/03/30", "2012/04/25", "2012/05/01", "2012/05/10")
data1 <- data.frame(id, type1, date1, type2, date2)
id type1 date1 type2 date2
1 EB 2011/08/31 missed 2011/03/17
1 EB 2011/08/31 missed 2011/03/18
2 EB 2012/01/15 kept 2011/03/30
2 IK 2012/01/20 missed 2012/04/25
3 IK 2012/03/10 kept 2012/05/01
3 EB 2012/03/24 missed 2012/05/10
First of all, I would like to merge these two dates columns into one ordered date column for each id. Second, I need a column called "event.type" in which missed/kept/EB/IK categories will be stored. Third, an "event.number" column is needed in which an event order numbers will be given for each id. Lastly, I need a column called "missed/kept counter" that counts the number of kept/missed for each id.
The data should look like below.
id <- c(1,1,1,2,2,2,2,3,3,3,3)
date <- c("2011/03/17", "2011/03/18", "2011/08/31", "2011/03/30", "2012/01/15", "2012/01/20","2012/04/25","2012/03/10","2012/03/24","2012/05/01","2012/05/10")
event.type <- c("missed", "missed", "EB", "kept", "EB", "IK", "missed", "IK", "EB", "kept", "missed")
event.number <- c(1,2,3,1,2,3,4,1,2,3,4)
missed.kept.counter <- c(1,2,0,1,0,0,1,0,0,1,2)
data2 <- data.frame(id,date,event.type,event.number,missed.kept.counter)
> data2
id date event.type event.number missed.kept.counter
1 2011/03/17 missed 1 1
1 2011/03/18 missed 2 2
1 2011/08/31 EB 3 0
2 2011/03/30 kept 1 1
2 2012/01/15 EB 2 0
2 2012/01/20 IK 3 0
2 2012/04/25 missed 4 1
3 2012/03/10 IK 1 0
3 2012/03/24 EB 2 0
3 2012/05/01 kept 3 1
3 2012/05/10 missed 4 2
I am looking forward to any help to get me out from this problem.
Thanks for your help in advance.
Best.
答案 0 :(得分:2)
Try with data.table
. Starting with your data combined row-wise isntead of column-wise, this gets your data already close to what you want:
library( data.table )
data1 <- data.table( id = rep( id, 2 ),
type = c( type1, type2 ),
date = c( date1, date2 ) )
Then sort that by id
and date
respectively:
setorder( data1, id, date )
To get the event.number
. I've edited this to be tidier, thanks to seeing @bgoldst's solution :)
data1[ , event.number := seq_len( .N ), by = id ]
Your last step, which seems to be a cumulative count of both "missed" and "kept" within each id
:
data1[ type == "missed" | type == "kept"
, missed.kept.number := cumsum( type == "missed" | type == "kept" ), by = id ]
This will give you the numbers you want, with NAs elsewhere in the missed.kept.number
column. If you specifically want zeros, add this line before the "last step" above:
data1[ , missed.kept.number := 0L ]
答案 1 :(得分:1)
Here's a way to do the first 3 tasks. I'm not sure what you want in the last task.
#Create data
id <- c(1,1,2,2,3,3)
type1 <- c("EB","EB","EB","IK","IK","EB")
date1 <- c("2011/08/31", "2011/08/31", "2012/01/15", "2012/01/20", "2012/03/10", "2012/03/24")
type2 <- c("missed", "missed", "kept", "missed", "kept", "missed")
date2 <- c("2011/03/17", "2011/03/18", "2011/03/30", "2012/04/25", "2012/05/01", "2012/05/10")
#Create data frames
data1 <- data.frame(id, date=date1, event.type=type1)
data2 <- data.frame(id, date=date2, event.type=type2)
#Merge and order data
df <- merge(data1, data2, all=T)
df <- df[!duplicated(df),]
df <- df[order(df$id, df$date),]
#Create event.number column
library(dplyr)
df$event.number <- (df %>% group_by(id) %>% mutate(counter = row_number(id)))$counter
答案 2 :(得分:1)
library(data.table);
## coerce frame to data.table and convert factors to character vectors
setDT(data1);
data1[j=names(data1)[-1L]:=lapply(.SD[,-1L,with=F],as.character)];
## transform data1 into data2, governed by ordered unique dates
data2 <- data1[by=id,j={
d <- c(date1,date2);
u <- which(!duplicated(d));
u <- u[order(d[u])];
.(date=d[u],event.type=c(type1,type2)[u]);
}];
## derive additional columns
data2[by=id,j=event.number:=seq_len(.N)];
data2[by=id,j=missed.kept.counter:={
cntl <- event.type%in%c('missed','kept');
ifelse(cntl,cumsum(cntl),0L);
}];
## result
data2;
## id date event.type event.number missed.kept.counter
## 1: 1 2011/03/17 missed 1 1
## 2: 1 2011/03/18 missed 2 2
## 3: 1 2011/08/31 EB 3 0
## 4: 2 2011/03/30 kept 1 1
## 5: 2 2012/01/15 EB 2 0
## 6: 2 2012/01/20 IK 3 0
## 7: 2 2012/04/25 missed 4 2
## 8: 3 2012/03/10 IK 1 0
## 9: 3 2012/03/24 EB 2 0
## 10: 3 2012/05/01 kept 3 1
## 11: 3 2012/05/10 missed 4 2