Question

使用R：我需要用左值最接近且“ unclassified_”填充所有NA单元格

下面的代码可以完美地填充w /最左边的值，但是我不知道如何将永久字符串“ Unclassified_”放在前面

library(zoo)
y <-  t(na.locf(t(x), fromLast=F)) #Fill NA cells with closest value to the left

示例数据

set.seed(1)
x <- data.frame(a=sample(c(1,2), 10, replace=T),
                b=sample(c(1,2,NA), 10, replace=T), 
                c=sample(c(1:5,NA), 10, replace=T))

给出df：

我想要

a       b            c
2   unclassifed_2    1      
2   2                1      
1   unclassifed_1    5      
1   1                2      
1   1                unclassified_1     
1   unclassified_1   4      
1   1                5      
2   2                unclassified_2     
2   2                unclassified_2         
1   unclassified_1   2

Answer 1

x$b[is.na(x$b)] = paste0("unclassifed_", x$a)
x$c[is.na(x$c)] = paste0("unclassifed_", x$a)

Answer 2

您可以使用粘贴和索引。这个循环做到了

for( i in 1:ncol(x)){

if( any( is.na(  x[, i ]))){
x[ is.na( x[ , i ] )  , i ] <- 
    paste0( "unclassified_", x[ is.na( x[ , i ] )  , i-1 ] )
}
}

Answer 3

使用基础R解决此问题的一种方法：

for(j in 2:3){
for(i in 1:length(x$a)){
  if(is.na(x[i, j])){
  x[i, j] <- paste0("unclassified_", x[i, (j-1)])
  }
}
}

结果：

> x
   a              b              c
1  1              1 unclassified_1
2  1              1              2
3  2 unclassified_2              4
4  2              2              1
5  1 unclassified_1              2
6  2              2              3
7  2 unclassified_2              1
8  2 unclassified_2              3
9  2              2 unclassified_2
10 1 unclassified_1              3

数据：

set.seed(1)
x <- data.frame(a=sample(c(1,2), 10, replace=T),
                b=sample(c(1,2,NA), 10, replace=T), 
                c=sample(c(1:5,NA), 10, replace=T))

请注意，如果您对输入数据进行采样，则种子对于获得所需结果很重要。

我想用以前的非NA值和“ Unclassified_”替换表中的NA

3 个答案: