Question

我有一个R数据框，如下所示：

IQueryable<Job> jobs = (from j in _db.Jobs

                join jt in _db.JobTranslators on j.Id equals jt.JobId into jts
                from jtResult in jts.DefaultIfEmpty()

                join jr in _db.JobRevisors on jtResult.Id equals jr.JobId into jrs
                from jrResult in jrs.DefaultIfEmpty()


                join u in _db.Users on jtResult.UserId equals u.Id into jtU
                from jtUResult in jtU.DefaultIfEmpty()

                where jtUResult.Id == userId

                orderby j.Id

                select j).Concat(
                from j in _db.Jobs

                join jt in _db.JobTranslators on j.Id equals jt.JobId into jts
                from jtResult in jts.DefaultIfEmpty()

                join jr in _db.JobRevisors on jtResult.Id equals jr.JobId into jrs
                from jrResult in jrs.DefaultIfEmpty()

                join u in _db.Users on jrResult.UserId equals u.Id into jrU
                from jrUResult in jrU.DefaultIfEmpty()

                where jtUResult.Id == userId

                orderby j.Id

                select j
                ).Distinct()

＆＃39;结束＆＃39;标记仅在每个id出现一次，并且是id的特定序列的端点。我想要的是一个数据框，如下所示：

id | seq_check | action | ct
123 | end | action_a | 1  
123 | start | action_b | 4  
123 | start | action_c | 1  
456 | end | action_d | 1  
456 | start | action_e | 16  
456 | start | action_f | 4  
456 | start | action_g | 5  
456 | start | action_h | 2  
456 | start | action_i | 1

有人知道如何在R中做到这一点吗？非常感谢！

Answer 1

您还可以使用dplyr和tidyr：

library(dplyr); library(tidyr);

spread(df, seq_check, action) %>% fill(end) %>% 
      mutate(seq_action = paste(end, start, sep = " <- ")) %>% 
      select(id, seq_action, ct)

   id           seq_action ct
1 123 action_a <- action_c  1
2 123 action_a <- action_b  4
3 456 action_d <- action_i  1
4 456 action_d <- action_h  2
5 456 action_d <- action_f  4
6 456 action_d <- action_g  5
7 456 action_d <- action_e 16

Answer 2

我们可以使用data.table。转换＆＃39; data.frame＆＃39;到＆＃39; data.table＆＃39; （setDT(df1)），按＆＃39; id＆＃39;分组，我们paste＆＃39;＆＃39;＆＃39;这对应于＆＃39; end＆＃39;在＆＃39; seq_check＆＃39;通过＆＃39;动作＆＃39;为了开始＆＃39;在＆seq_check＆＃39;中，以及子集化＆＃39; ct＆＃39;在哪里＆＃39; seq_check＆＃39;是开始＆＃39;

library(data.table)
setDT(df1)[,.(seq_action=paste(action[seq_check=="end"],action[seq_check=="start"],
              sep=" <- "), ct = ct[seq_check=="start"]) , by =  id]
#    id           seq_action ct
#1: 123 action_a <- action_b  4
#2: 123 action_a <- action_c  1
#3: 456 action_d <- action_e 16
#4: 456 action_d <- action_f  4
#5: 456 action_d <- action_g  5
#6: 456 action_d <- action_h  2
#7: 456 action_d <- action_i  1

注意：只使用了一个包。

或将na.locf与dcast

一起使用

library(zoo)
dcast(setDT(df1), id+ct~seq_check, value.var = "action")[, .(id, 
              seq_action=paste(na.locf(end), start, sep=" <- "), ct)]

条件频率计数，粘贴特定列

2 个答案: