为什么在set列中设置为NA而不是0

时间:2015-04-20 16:49:05

标签: r

从这里获取数据集: how to insert a new column in a dataset with values if it satisfies a statement

df1 <- read.table(header=TRUE, text = "
    Chr start       end     num.mark    seg.mean    id
    1   68580000    68640000    8430    0.7       gain
    1   115900000   116260000   8430    0.0039    loss
    1   173500000   173680000   5      -1.7738    loss
    1   173500000   173680000   12       0.011    loss
    1   173840000   174010000   6      -1.6121    loss")

为什么以下within语句会在“占领”列中的NA中生成?

within(df1, {Occurance <- 0 
             Occurance[seg.mean >= 0.5 & id == "gain"] <- 1
             Occurance[seg.mean <= -0.5 & id == "loss"] <- -1})

结果:

  Chr     start       end num.mark seg.mean   id Occurance
1   1  68580000  68640000     8430   0.7000 gain         1
2   1 115900000 116260000     8430   0.0039 loss        NA
3   1 173500000 173680000        5  -1.7738 loss        -1
4   1 173500000 173680000       12   0.0110 loss        NA
5   1 173840000 174010000        6  -1.6121 loss        -1

如果我分两步完成:

df2 <- within(df1, Occurance <- 0)
within(df2, {Occurance[seg.mean >= 0.5 & id == "gain"] <- 1;
             Occurance[seg.mean <= -0.5 & id == "loss"] <- -1})

我确实得到了希望的结果

  Chr     start       end num.mark seg.mean   id Occurance
1   1  68580000  68640000     8430   0.7000 gain         1
2   1 115900000 116260000     8430   0.0039 loss         0
3   1 173500000 173680000        5  -1.7738 loss        -1
4   1 173500000 173680000       12   0.0110 loss         0
5   1 173840000 174010000        6  -1.6121 loss        -1

1 个答案:

答案 0 :(得分:6)

这与如何在R中初始化和扩展向量有关。例如

a <- 0
a[1:10>5] <- 2
# [1]  0 NA NA NA NA  2  2  2  2  2

首次创建a时,它的长度为1.当您为不存在的索引分配时,R会创建这些索引并使用NA值填充缺失值。这基本上就是你的例子中发生的事情。在您的代码块完成之前,R不会将您的新列合并到data.frame。

您的步骤方法有效,因为在第一个within()结束后,单个元素向量0被循环到data.frame的整个长度。

为什么不使用更加矢量化的方法。

within(df1, {Occurance <- 
     ifelse(seg.mean >= 0.5 & id == "gain", 1, 
     ifelse(seg.mean <= -0.5 & id == "loss", -1, 0))
})

或者您可以将Occurance初始化为正确的长度

within(df1, {Occurance <- rep(0, length( seg.mean))
    Occurance[seg.mean >= 0.5 & id == "gain"] <- 1
    Occurance[seg.mean <= -0.5 & id == "loss"] <- -1
})