我有一个R数据框,如下所示:
IQueryable<Job> jobs = (from j in _db.Jobs
join jt in _db.JobTranslators on j.Id equals jt.JobId into jts
from jtResult in jts.DefaultIfEmpty()
join jr in _db.JobRevisors on jtResult.Id equals jr.JobId into jrs
from jrResult in jrs.DefaultIfEmpty()
join u in _db.Users on jtResult.UserId equals u.Id into jtU
from jtUResult in jtU.DefaultIfEmpty()
where jtUResult.Id == userId
orderby j.Id
select j).Concat(
from j in _db.Jobs
join jt in _db.JobTranslators on j.Id equals jt.JobId into jts
from jtResult in jts.DefaultIfEmpty()
join jr in _db.JobRevisors on jtResult.Id equals jr.JobId into jrs
from jrResult in jrs.DefaultIfEmpty()
join u in _db.Users on jrResult.UserId equals u.Id into jrU
from jrUResult in jrU.DefaultIfEmpty()
where jtUResult.Id == userId
orderby j.Id
select j
).Distinct()
&#39;结束&#39;标记仅在每个id出现一次,并且是id的特定序列的端点。我想要的是一个数据框,如下所示:
id | seq_check | action | ct
123 | end | action_a | 1
123 | start | action_b | 4
123 | start | action_c | 1
456 | end | action_d | 1
456 | start | action_e | 16
456 | start | action_f | 4
456 | start | action_g | 5
456 | start | action_h | 2
456 | start | action_i | 1
有人知道如何在R中做到这一点吗?非常感谢!
答案 0 :(得分:3)
您还可以使用dplyr
和tidyr
:
library(dplyr); library(tidyr);
spread(df, seq_check, action) %>% fill(end) %>%
mutate(seq_action = paste(end, start, sep = " <- ")) %>%
select(id, seq_action, ct)
id seq_action ct
1 123 action_a <- action_c 1
2 123 action_a <- action_b 4
3 456 action_d <- action_i 1
4 456 action_d <- action_h 2
5 456 action_d <- action_f 4
6 456 action_d <- action_g 5
7 456 action_d <- action_e 16
答案 1 :(得分:2)
我们可以使用data.table
。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df1)
),按&#39; id&#39;分组,我们paste
&#39;&#39;&#39;这对应于&#39; end&#39;在&#39; seq_check&#39;通过&#39;动作&#39;为了开始&#39;在&seq_check&#39;中,以及子集化&#39; ct&#39;在哪里&#39; seq_check&#39;是开始&#39;
library(data.table)
setDT(df1)[,.(seq_action=paste(action[seq_check=="end"],action[seq_check=="start"],
sep=" <- "), ct = ct[seq_check=="start"]) , by = id]
# id seq_action ct
#1: 123 action_a <- action_b 4
#2: 123 action_a <- action_c 1
#3: 456 action_d <- action_e 16
#4: 456 action_d <- action_f 4
#5: 456 action_d <- action_g 5
#6: 456 action_d <- action_h 2
#7: 456 action_d <- action_i 1
注意:只使用了一个包。
或将na.locf
与dcast
library(zoo)
dcast(setDT(df1), id+ct~seq_check, value.var = "action")[, .(id,
seq_action=paste(na.locf(end), start, sep=" <- "), ct)]