从这里获取数据集: how to insert a new column in a dataset with values if it satisfies a statement
df1 <- read.table(header=TRUE, text = "
Chr start end num.mark seg.mean id
1 68580000 68640000 8430 0.7 gain
1 115900000 116260000 8430 0.0039 loss
1 173500000 173680000 5 -1.7738 loss
1 173500000 173680000 12 0.011 loss
1 173840000 174010000 6 -1.6121 loss")
为什么以下within
语句会在“占领”列中的NA
中生成?
within(df1, {Occurance <- 0
Occurance[seg.mean >= 0.5 & id == "gain"] <- 1
Occurance[seg.mean <= -0.5 & id == "loss"] <- -1})
结果:
Chr start end num.mark seg.mean id Occurance
1 1 68580000 68640000 8430 0.7000 gain 1
2 1 115900000 116260000 8430 0.0039 loss NA
3 1 173500000 173680000 5 -1.7738 loss -1
4 1 173500000 173680000 12 0.0110 loss NA
5 1 173840000 174010000 6 -1.6121 loss -1
如果我分两步完成:
df2 <- within(df1, Occurance <- 0)
within(df2, {Occurance[seg.mean >= 0.5 & id == "gain"] <- 1;
Occurance[seg.mean <= -0.5 & id == "loss"] <- -1})
我确实得到了希望的结果
Chr start end num.mark seg.mean id Occurance
1 1 68580000 68640000 8430 0.7000 gain 1
2 1 115900000 116260000 8430 0.0039 loss 0
3 1 173500000 173680000 5 -1.7738 loss -1
4 1 173500000 173680000 12 0.0110 loss 0
5 1 173840000 174010000 6 -1.6121 loss -1
答案 0 :(得分:6)
这与如何在R中初始化和扩展向量有关。例如
a <- 0
a[1:10>5] <- 2
# [1] 0 NA NA NA NA 2 2 2 2 2
首次创建a
时,它的长度为1.当您为不存在的索引分配时,R会创建这些索引并使用NA值填充缺失值。这基本上就是你的例子中发生的事情。在您的代码块完成之前,R不会将您的新列合并到data.frame。
您的步骤方法有效,因为在第一个within()
结束后,单个元素向量0被循环到data.frame的整个长度。
为什么不使用更加矢量化的方法。
within(df1, {Occurance <-
ifelse(seg.mean >= 0.5 & id == "gain", 1,
ifelse(seg.mean <= -0.5 & id == "loss", -1, 0))
})
或者您可以将Occurance初始化为正确的长度
within(df1, {Occurance <- rep(0, length( seg.mean))
Occurance[seg.mean >= 0.5 & id == "gain"] <- 1
Occurance[seg.mean <= -0.5 & id == "loss"] <- -1
})