用第一个数字替换一个4位数字

时间:2014-07-08 05:58:08

标签: r

问题是:

我有7列,超过30%的行是NA。我的所有列都是数字。

在这些高缺失值列上,我想根据这些列的分位数值创建4个新列。

1st column- input 1 in rows which contains data; 0 otherwise
2nd column- input 1 in rows below the first quantile; 0 otherwise
3rd column- input 1 in rows that are in the 2nd quantile range; 0 otherwise
4th column- input 1 in rows that are above the 3rd quantile; 0 otherwise

我得到了第一列。但其余的,基于分位数的阈值一直是一个挑战。

我接下来的3个专栏仅基于3个分位数:33.33333%,66.66667%和100%

quantile(High_NAS_set1$EFX, prob=c(33/99,66/99,99/99),na.rm=TRUE)

这是我到目前为止所拥有的......

#1st column: assign 1 for a row that contains data; 0 otherwise

New.EFX_<-High_NAS_set1$EFX #creating a new column


New.EFX_Emp_Total[!is.na(New.EFX)]<-1
New.EFX_Emp_Total[is.na(New.EFX)]<-0


#2nd Column:assign 1 in rows below the first quantile; 0 otherwise

New.EFX2_<-High_NAS_set1$EFX #creating a new column

quant<-quantile(New2.EFX_Emp,probs=33/99,na.rm=TRUE)

which(New2.EFX_Emp_Total<=quant)<-1  # assign 1 for rows which indexes are below quant
which(New2.EFX_Emp_Total!=quant)<-0

最后两行给我一个错误:

Error in which(New2.EFX_Emp_Total <= quant) <- 1 : 
  could not find function "which<-"

1 个答案:

答案 0 :(得分:0)

一种方法:

qtl <- quantile(High_NAS_set1$EFX, prob=c(1/3, 2/3, 1), na.rm=TRUE)

High_NAS_set1$EFX033 <- ifelse(High_NAS_set1$EFX <= qtl[1], 1, 0)
High_NAS_set1$EFX066 <- ifelse(High_NAS_set1$EFX <= qtl[2], 1, 0)
High_NAS_set1$EFX100 <- ifelse(High_NAS_set1$EFX <= qtl[3], 1, 0)