获得R中不是数字的2个矩阵或数据帧之间的差异

时间:2017-10-27 00:22:22

标签: r matrix difference

这是我的难题。我正试图每天跟踪我研究中的患者状况。我目前已经构建了一个执行此操作的代码,输出如下所示:

        P1    Waitlisted    
        P80   Lab Appointment
        P19   Lab Appointment
        P26   Waitlisted

我正在试图找出如何区分我今天运行的报告与昨天运行的报告之间的区别,以便基本上快速跟踪列表中出现的任何新患者或已经出现的任何患者除去。因此,如果第二天,我的数据框是

        P20     Waitlisted
        P1      Waitlisted    
        P80     Lab Appointment
        P19     Lab Appointment
        P5      Lab Appointment
        P26     Waitlisted

我会得到输出:

        P20     Waitlisted
        P5      Lab Appointment

如果结果是

那么两者之间的差异或第二天的差异
        P1    Waitlisted    
        P80   Lab Appointment
        P80   Waitlisted
        P19   Lab Appointment
        P26   Waitlisted

输出将生成:

        P80   Waitlisted

如果病人在前一天被从我的名单中删除,我也想谈谈,如果我得到像

这样的输出
        P1    Waitlisted    
        P80   Lab Appointment
        P26   Waitlisted

有一种方法可以知道今天P19 Lab Appointment已不在我的名单中。

我尝试过以下代码,但我只能得到逻辑因素而无法知道什么是真假。

    >apply(apply(df1,2,`==`,df2),1,any)
    [1] FALSE  TRUE FALSE FALSE FALSE    NA    NA    NA  TRUE  TRUE FALSE 
    FALSE FALSE FALSE  TRUE    NA  TRUE FALSE FALSE
    [20] FALSE    NA    NA FALSE FALSE  TRUE FALSE FALSE FALSE    NA  TRUE 
    FALSE FALSE  TRUE    NA    NA  TRUE  TRUE  TRUE
    [39]  TRUE  TRUE    NA    NA    NA  TRUE

2 个答案:

答案 0 :(得分:2)

您可以使用反连接来获取天数之间的差异。具体在data.table您可能会这样做:

library(data.table)
setDT(df1); setDT(df2)
removed_patient_status <- df1[!df2, on = c("status", "patient")]
new_patient_status <- df2[!df1, on = c("status", "patient")]

removed_patient_status
#Empty data.table (0 rows) of 2 cols: patient,status

new_patient_status
#   patient          status
#1:     P20      Waitlisted
#2:      P5 Lab Appointment

dplyr

library(dplyr)
removed_patient_status <- anti_join(df1, df2, by = c("status", "patient"))
new_patient_status <- anti_join(df2, df1, by = c("status", "patient"))

数据:

df1 <- data.frame(patient = c("P1", "P80", "P19", "P26"), status = c("Waitlisted", "Lab Appointment", "Lab Appointment", "Waitlisted"), stringsAsFactors = FALSE)
df2 <- data.frame(patient = c("P20", "P1", "P80", "P19", "P5","P26"), status = c("Waitlisted", "Waitlisted", "Lab Appointment", "Lab Appointment", "Lab Appointment","Waitlisted"), stringsAsFactors = FALSE)

答案 1 :(得分:0)

关于你的第一个问题:

df1 <- data.frame(P = c("P1","P80","P19","P26"), Status=c("Waitlisted","Lab Appointment", "Lab Appointment", "Waitlisted"))      

df2 <- data.frame(P = c("P20","P1","P80","P19","P5","P26"), Status=c("Waitlisted","Waitlisted","Lab Appointment","Lab Appointment","Lab Appointment", "Waitlisted"))                 

df2[!(paste(df2$P, df2$Status) %in% paste(df1$P, df1$Status)),] #removed patients