嵌套ifelse条件的不同顺序的不同结果

时间:2016-09-11 10:27:26

标签: r if-statement

我正在尝试在数据框中创建一个新列,该列将包含取决于同一数据框中多个其他列中的条件的信息。我的研究涉及量化冠状动脉(心脏动脉)闭塞的严重程度。

示例数据框x是:

structure(list(Study_number = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 
3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 
9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 
13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 
17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 
21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 
25, 25, 26, 26, 26, 26, 27, 27, 27, 28, 28, 28, 28, 29, 29, 29, 
29, 30, 30, 30, 30, 31, 31, 31, 31, 32, 32, 32, 32, 33, 33, 33, 
34, 34, 34, 34, 35, 36, 36, 36, 36, 37, 37, 37, 37, 38, 38, 38, 
38, 39, 39, 39, 39, 40, 40, 40, 40, 41, 41, 41, 41, 42, 42, 42, 
42, 43, 43, 43, 43, 44, 44, 44, 44, 45, 45, 45, 45, 46, 46, 46, 
46, 47, 47, 47, 47, 48, 48, 48, 48, 49, 49, 49, 49, 50, 50, 50, 
50, 51, 51, 51, 51, 52, 52, 52, 53, 53, 53, 53, 54, 54, 54, 54, 
55, 55, 55, 56, 56, 56, 56, 57, 57, 57, 57, 58, 58, 58, 58, 59, 
59, 59, 59, 60, 60, 60, 60, 61, 61, 61, 61, 62, 62, 63, 63, 63, 
63, 64, 64, 64, 64, 65, 65, 65, 65, 66, 66), Vessel = c(1, 2, 
3, 4, 1, 2, 3, 4, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 
2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 1, 2, 3, 
4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 1, 
2, 3, 4, 2, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 
2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
4, 1, 2, 3, 4, 1, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3, 4, 1, 2, 
3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
4, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3), Segment = c(3, 
9, 7, 8, 2, 9, 7, 8, 9, 7, 8, 3, 9, 6, 11, 3, 9, 6, 8, 2, 9, 
9, 15, 2, 9, 7, 8, 2, 9, 6, 8, 2, 9, 2, 9, 7, 8, 3, 9, 9, 11, 
1, 9, 7, 8, 2, 9, 6, 8, 2, 9, 7, 11, 1, 9, 6, 12, 2, 9, 7, 11, 
2, 9, 6, 15, 2, 9, 6, 8, 2, 9, 7, 8, 3, 9, 7, 11, 2, 9, 6, 11, 
2, 9, 7, 8, 1, 9, 6, 11, 2, 9, 8, 11, 2, 9, 7, 8, 2, 9, 7, 11, 
9, 7, 11, 2, 9, 6, 11, 3, 9, 7, 11, 2, 9, 6, 11, 2, 9, 7, 8, 
1, 9, 6, 11, 4, 9, 7, 3, 9, 7, 8, 9, 2, 9, 7, 8, 2, 9, 7, 11, 
1, 9, 7, 14, 2, 9, 7, 11, 2, 9, 6, 12, 2, 9, 6, 11, 2, 9, 7, 
8, 2, 9, 9, 8, 2, 9, 7, 12, 2, 9, 7, 11, 1, 9, 7, 8, 2, 9, 7, 
15, 2, 9, 6, 11, 2, 9, 6, 8, 3, 9, 10, 14, 2, 9, 6, 11, 1, 6, 
11, 1, 9, 6, 8, 1, 9, 7, 11, 2, 8, 12, 2, 9, 7, 8, 1, 9, 7, 11, 
0, 9, 6, 12, 1, 9, 7, 8, 0, 9, 6, 11, 0, 9, 7, 8, 9, 7, 3, 9, 
7, 8, 2, 9, 7, 11, 21, 9, 6, 11, 9, 7), Severity = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 
1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("Study_number", 
"Vessel", "Segment", "Severity"), row.names = c(NA, -250L), class = c("tbl_df", 
"tbl", "data.frame"))

实际数据框如下所示:

    Study_number Vessel Segment Severity
          <dbl>  <dbl>   <dbl>    <dbl>
1             1      1       3        0
2             1      2       9        0
3             1      3       7        0
4             1      4       8        0
5             2      1       2        0
6             2      2       9        0
7             2      3       7        0
8             2      4       8        0
9             3      2       9        0
10            3      3       7        1
  • Study_number =参与者ID
  • 船只=船只ID(1至4)
  • 段=该特定船只的段ID
  • 严重程度=该容器中疾病的严重程度(0 =否,1 =是)

每个参与者通常有4艘船(1-4),即使有些参与者可能只有3艘船。我想要实现的是一个名为“Overall_severe_disease”的新列,它应该满足以下条件。

  1. 当船只2患有严重疾病时(即Vessel == 2且同一行的严重程度== 1);或

  2. 当容器3具有严重疾病的第6段或第7段(即,船舶== 3且段= = 6或7且相应行的严重程度== 1)且至少另一艘船具有严重疾病(即,Severity列的总和== 2); OR

  3. 当3个或更多个血管患有严重疾病时(即严重性总和>每个参与者= 3)。

  4. 这就是我试图解决这个问题的方法。首先将它们粘贴在一起创建Vessel-Severity列。

    x$Vessel_Severity <- paste(x$Vessel, x$Severity, sep = '-')
    

    新数据框将如下所示:

      Study_number Vessel Segment Severity Vessel_Severity
             <dbl>  <dbl>   <dbl>    <dbl>           <chr>
    1            1      1       3        0             1-0
    2            1      2       9        0             2-0
    3            1      3       7        0             3-0
    4            1      4       8        0             4-0
    5            2      1       2        0             1-0
    6            2      2       9        0             2-0
    

    然后我在plyr包中使用以下ddply函数将嵌套的ifelse条件应用于每个参与者。

    library(plyr)
    x <- ddply(x, 'Study_number', transform,
    Overall_severe_disease = ifelse(Vessel_Severity == '3-1' &  Segment %in% c(6,7) & sum(Severity) == 2 , 1,
           ifelse(Vessel_Severity == '2-1', 1,
           ifelse(sum(Severity) >= 3, 1, 0))))
    

    之后,我使用以下函数将“Yes”或“No”分配给“Overall_severe_disease”列(如果任何行至少有一个'1',那么它在参与者级别被指定为'是')

    x <- ddply(x, 'Study_number', transform, Overall_severe_disease = ifelse(sum(Overall_severe_disease) >= 1, 'Yes', 'No'))
    

    此方法有效,它为我提供了9个独特的参与者'Overall_severe_disease'

    length(unique(x$Study_number[x$Overall_severe_disease=='Yes']))
    

    #9

    但是如果我改变ifelse的顺序并将最后一个条件放在我的嵌套ifelse语句(ifelse(sum(Severity) >= 3)的开头,那么ddply将不会应用除此之外的其余语句,我将完全得到低估的结果(5个独特的参与者而非9个)

    x <- ddply(x, 'Study_number', transform,
               Overall_severe_disease = ifelse(sum(Severity) >= 3, 1,
                                        ifelse(Vessel_Severity == '2-1', 1,
                                       ifelse(Vessel_Severity == '3-1' &  Segment %in% c(6,7) & sum(Severity) == 2 , 1 , 0))))
    
    x <- ddply(x, 'Study_number', transform, Overall_severe_disease = ifelse(sum(Overall_severe_disease) >= 1, 'Yes', 'No'))
    
    length(unique(x$Study_number[x$Overall_severe_disease=='Yes']))
    

    #5

    我对此行为感到困惑。我会感激一些建议和澄清。

1 个答案:

答案 0 :(得分:0)

在您的示例中,您应该替换

    x$Vessel_Severity -> paste(x$Vessel, x$Severity, sep = '-')

    x$Vessel_Severity <- paste(x$Vessel, x$Severity, sep = '-')

尝试重现你的例子,你不能得到9和5 for anwser吗?

    # first example
    x$Overall_severe_disease<-0
    x <- ddply(x, 'Study_number', transform,
            Overall_severe_disease = ifelse(Vessel_Severity == '3-1' &  Segment %in% c(6,7) & sum(Severity) == 2 , 1, 0))
    sum(x$Overall_severe_disease) #4

    x <- ddply(x, 'Study_number', transform,
    Overall_severe_disease = ifelse(Vessel_Severity == '3-1' &  Segment %in% c(6,7) & sum(Severity) == 2 , 1, ifelse(Vessel_Severity == '2-1', 1,0)))
    sum(x$Overall_severe_disease) #4

    x <- ddply(x, 'Study_number', transform,
            Overall_severe_disease = ifelse(Vessel_Severity == '3-1' &  Segment %in% c(6,7) & sum(Severity) == 2 , 1,
                    ifelse(Vessel_Severity == '2-1', 1,
                            ifelse(sum(Severity) >= 3, 1, 0))))
    sum(x$Overall_severe_disease) #24
    res<-tapply(x$Overall_severe_disease,x$Study_number,sum)
    length(res[res>0])#9
    x <- ddply(x, 'Study_number', transform, Overall_severe_disease = ifelse(sum(Overall_severe_disease) >= 1, 'Yes', 'No'))
    length(unique(x$Study_number[x$Overall_severe_disease=='Yes'])) #9

    # second example

    x <- ddply(x, 'Study_number', transform,
        Overall_severe_disease = ifelse(sum(Severity) >= 3, 1, 0))
    sum(x$Overall_severe_disease) #20 

    x <- ddply(x, 'Study_number', transform,
            Overall_severe_disease = ifelse(sum(Severity) >= 3, 1, ifelse(Vessel_Severity == '2-1', 1,0)))
    sum(x$Overall_severe_disease) #20 

    x <- ddply(x, 'Study_number', transform,
            Overall_severe_disease = ifelse(sum(Severity) >= 3, 1,
                    ifelse(Vessel_Severity == '2-1', 1,
                            ifelse(Vessel_Severity == '3-1' &  Segment %in% c(6,7) & sum(Severity) == 2 , 1 , 0))))
    sum(x$Overall_severe_disease) #20 
    res<-tapply(x$Overall_severe_disease,x$Study_number,sum)
    length(res[res>0])#5
    x <- ddply(x, 'Study_number', transform, Overall_severe_disease = ifelse(sum(Overall_severe_disease) >= 1, 'Yes', 'No'))
    length(unique(x$Study_number[x$Overall_severe_disease=='Yes'])) #5

因此,在第二个示例中,对应于条件ifelse(Vessel_Severity == '3-1' & Segment %in% c(6,7) & sum(Severity) == 2 , 1, 0))的4将被删除。这是一个很好的问题。