问题是:
我有7列,超过30%的行是NA。我的所有列都是数字。
在这些高缺失值列上,我想根据这些列的分位数值创建4个新列。
1st column- input 1 in rows which contains data; 0 otherwise
2nd column- input 1 in rows below the first quantile; 0 otherwise
3rd column- input 1 in rows that are in the 2nd quantile range; 0 otherwise
4th column- input 1 in rows that are above the 3rd quantile; 0 otherwise
我得到了第一列。但其余的,基于分位数的阈值一直是一个挑战。
我接下来的3个专栏仅基于3个分位数:33.33333%,66.66667%和100%
quantile(High_NAS_set1$EFX, prob=c(33/99,66/99,99/99),na.rm=TRUE)
这是我到目前为止所拥有的......
#1st column: assign 1 for a row that contains data; 0 otherwise
New.EFX_<-High_NAS_set1$EFX #creating a new column
New.EFX_Emp_Total[!is.na(New.EFX)]<-1
New.EFX_Emp_Total[is.na(New.EFX)]<-0
#2nd Column:assign 1 in rows below the first quantile; 0 otherwise
New.EFX2_<-High_NAS_set1$EFX #creating a new column
quant<-quantile(New2.EFX_Emp,probs=33/99,na.rm=TRUE)
which(New2.EFX_Emp_Total<=quant)<-1 # assign 1 for rows which indexes are below quant
which(New2.EFX_Emp_Total!=quant)<-0
最后两行给我一个错误:
Error in which(New2.EFX_Emp_Total <= quant) <- 1 :
could not find function "which<-"
答案 0 :(得分:0)
一种方法:
qtl <- quantile(High_NAS_set1$EFX, prob=c(1/3, 2/3, 1), na.rm=TRUE)
High_NAS_set1$EFX033 <- ifelse(High_NAS_set1$EFX <= qtl[1], 1, 0)
High_NAS_set1$EFX066 <- ifelse(High_NAS_set1$EFX <= qtl[2], 1, 0)
High_NAS_set1$EFX100 <- ifelse(High_NAS_set1$EFX <= qtl[3], 1, 0)