我有一个大型数据集,如果满足以下条件,我想在数据集中插入一个二进制值(0& 1)的新列。
如果df1$seg.mean >= 0.5
的列等于df1$id == gain
且df1$seg.mean <= -0.5
等于df1$id == loss
,请在df1$Occurance
中插入1。
对于那些不符合此条件的行,请指定df1$Occurance == 0
df1 <-
Chr start end num.mark seg.mean id
1 68580000 68640000 8430 0.7 gain
1 115900000 116260000 8430 0.0039 loss
1 173500000 173680000 5 -1.7738 loss
1 173500000 173680000 12 0.011 loss
1 173840000 174010000 6 -1.6121 loss
期望的输出
Chr start end num.mark seg.mean id Occurance
1 68580000 68640000 8430 0.7 gain 1
1 115900000 116260000 8430 0.0039 loss 0
1 173500000 173680000 5 -1.7738 loss 1
1 173500000 173680000 12 0.011 loss 0
1 173840000 174010000 6 -1.6121 loss 1
答案 0 :(得分:4)
尝试使用ifelse
df1$Occurance <- ifelse((df1$seg.mean >= 0.5 & df1$id == "gain") |
(df1$seg.mean <= -0.5 & df1$id == "loss"), 1, 0)
修改:避免使用ifelse
并使用within
,因为您无法一直使用df1
transform(df1, Occurance = as.numeric((seg.mean >= 0.5 & id == "gain") |
(seg.mean <= -0.5 & id == "loss")))
评论:如果你也接受1/0的TRUE / FALSE,你可以跳过as.numeric
编辑#2:如果你想有多个结果,比如-1,0,1你可以做以下
df1$Occurance = 0
within(df1, {Occurance[seg.mean >= 0.5 & id == "gain"] <- 1;
Occurance[seg.mean <= -0.5 & id == "loss"] <- -1})
导致
Chr start end num.mark seg.mean id Occurance
1 1 68580000 68640000 8430 0.7000 gain 1
2 1 115900000 116260000 8430 0.0039 loss 0
3 1 173500000 173680000 5 -1.7738 loss -1
4 1 173500000 173680000 12 0.0110 loss 0
5 1 173840000 174010000 6 -1.6121 loss -1
答案 1 :(得分:2)
试试这个:
df1$Occurance <- (df1$seg.mean >= 0.5 & df1$id == "gain") |
(df1$seg.mean <= -0.5 & df1$id == "loss"))*1
# TRUE*1 = 1
# FALSE*1 = 0
答案 2 :(得分:-1)
你也可以这样做:
df1$Occurrence[with(df1,(seg.mean>=.5 & id == "gain") | (seg.mean<=-.5 & id=="loss"))]<-1
df1$Occurrence[is.na(df1$Occurrence)]<-0