条件频率计数,粘贴特定列

时间:2016-06-17 15:33:07

标签: r dataframe conditional-formatting

我有一个R数据框,如下所示:

IQueryable<Job> jobs = (from j in _db.Jobs

                join jt in _db.JobTranslators on j.Id equals jt.JobId into jts
                from jtResult in jts.DefaultIfEmpty()

                join jr in _db.JobRevisors on jtResult.Id equals jr.JobId into jrs
                from jrResult in jrs.DefaultIfEmpty()


                join u in _db.Users on jtResult.UserId equals u.Id into jtU
                from jtUResult in jtU.DefaultIfEmpty()

                where jtUResult.Id == userId

                orderby j.Id

                select j).Concat(
                from j in _db.Jobs

                join jt in _db.JobTranslators on j.Id equals jt.JobId into jts
                from jtResult in jts.DefaultIfEmpty()

                join jr in _db.JobRevisors on jtResult.Id equals jr.JobId into jrs
                from jrResult in jrs.DefaultIfEmpty()

                join u in _db.Users on jrResult.UserId equals u.Id into jrU
                from jrUResult in jrU.DefaultIfEmpty()

                where jtUResult.Id == userId

                orderby j.Id

                select j
                ).Distinct()

&#39;结束&#39;标记仅在每个id出现一次,并且是id的特定序列的端点。我想要的是一个数据框,如下所示:

id | seq_check | action | ct
123 | end | action_a | 1  
123 | start | action_b | 4  
123 | start | action_c | 1  
456 | end | action_d | 1  
456 | start | action_e | 16  
456 | start | action_f | 4  
456 | start | action_g | 5  
456 | start | action_h | 2  
456 | start | action_i | 1

有人知道如何在R中做到这一点吗?非常感谢!

2 个答案:

答案 0 :(得分:3)

您还可以使用dplyrtidyr

library(dplyr); library(tidyr);

spread(df, seq_check, action) %>% fill(end) %>% 
      mutate(seq_action = paste(end, start, sep = " <- ")) %>% 
      select(id, seq_action, ct)

   id           seq_action ct
1 123 action_a <- action_c  1
2 123 action_a <- action_b  4
3 456 action_d <- action_i  1
4 456 action_d <- action_h  2
5 456 action_d <- action_f  4
6 456 action_d <- action_g  5
7 456 action_d <- action_e 16

答案 1 :(得分:2)

我们可以使用data.table。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df1)),按&#39; id&#39;分组,我们paste&#39;&#39;&#39;这对应于&#39; end&#39;在&#39; seq_check&#39;通过&#39;动作&#39;为了开始&#39;在&seq_check&#39;中,以及子集化&#39; ct&#39;在哪里&#39; seq_check&#39;是开始&#39;

library(data.table)
setDT(df1)[,.(seq_action=paste(action[seq_check=="end"],action[seq_check=="start"],
              sep=" <- "), ct = ct[seq_check=="start"]) , by =  id]
#    id           seq_action ct
#1: 123 action_a <- action_b  4
#2: 123 action_a <- action_c  1
#3: 456 action_d <- action_e 16
#4: 456 action_d <- action_f  4
#5: 456 action_d <- action_g  5
#6: 456 action_d <- action_h  2
#7: 456 action_d <- action_i  1

注意:只使用了一个包。

或将na.locfdcast

一起使用
library(zoo)
dcast(setDT(df1), id+ct~seq_check, value.var = "action")[, .(id, 
              seq_action=paste(na.locf(end), start, sep=" <- "), ct)]