使用取决于条件

时间:2015-10-19 01:47:03

标签: r replace na

> str(store)
'data.frame':   1115 obs. of  10 variables:
 $ Store                    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ StoreType                : Factor w/ 4 levels "a","b","c","d": 3 1 1 3 1 1 1 1 1 1 ...
 $ Assortment               : Factor w/ 3 levels "a","b","c": 1 1 1 3 1 1 3 1 3 1 ...
 $ CompetitionDistance      : int  1270 570 14130 620 29910 310 24000 7520 2030 3160 ...
 $ CompetitionOpenSinceMonth: int  9 11 12 9 4 12 4 10 8 9 ...
 $ CompetitionOpenSinceYear : int  2008 2007 2006 2009 2015 2013 2013 2014 2000 2009 ...
 $ Promo2                   : int  0 1 1 0 0 0 0 0 0 0 ...
 $ Promo2SinceWeek          : int  NA 13 14 NA NA NA NA NA NA NA ...
 $ Promo2SinceYear          : int  NA 2010 2011 NA NA NA NA NA NA NA ...
 $ PromoInterval            : Factor w/ 4 levels "","Feb,May,Aug,Nov",..: 1 3 3 1 1 1 1 1 1 1 ...

我试图根据Promo2值替换NA。如果Promo2 == 0,则该行中的NA值必须为零,否则如果Promo2 == 1缺少值应替换为列的意思。

不明白我的代码为什么不编辑商店数据。

for (i in 1:nrow(store)){
  if(is.na(store[i,])== TRUE & store$Promo2[i] ==0){
    store[i,] <- ifelse(is.na(store[i,]),0,store[i,])
  }
  else if (is.na(store[i,])== TRUE & store$Promo2[i] ==1){
    for(j in 1:ncol(store)){
      store[is.na(store[i,j]), j] <- mean(store[,j], na.rm = TRUE)
    }
  }
}

2 个答案:

答案 0 :(得分:3)

对于Promo2SinceWeek专栏:

store$Promo2SinceWeek[store$Promo2==0 & is.na(store$Promo2SinceWeek)] <- 0
store$Promo2SinceWeek[store$Promo2==1 & is.na(store$Promo2SinceWeek)] <- mean(store$Promo2SinceWeek, na.rm=TRUE)

对于其他专栏,请使用相同的方法。矢量化函数是R的一个非常有用的特性。

答案 1 :(得分:0)

修复for循环:

for(i in 1:nrow(store)) {
  col <- which(is.na(store[i,]))
  store[i,][col] <- if(store$Promo2[i] == 1) colMeans(store[col], na.rm=TRUE) else 0
}

或者,如果您不想要任何if语句:

for (i in 1:nrow(store)) {

  store[i,][is.na(store[i,]) & store$Promo2[i] ==0] <- 0

  store[i,][is.na(store[i,]) & store$Promo2[i] ==1] <- 
       colMeans(store[,is.na(store[i,]) & store$Promo2[i] ==1], na.rm = TRUE)

}

您的循环无效,因为if语句接受来自测试的一个条件值。您的循环向其发送if(is.na(store[i,])== TRUE & store$Promo2[i] ==0)。但该条件语句将具有许多值TRUE FALSE FALSE FALSE TRUE...。它只是一个值,它是一系列的真实和谬误,一个 TRUE或一个 FALSE。只有在给出倍数时,该函数才会取第一个值。

可重复的示例

store
#                  Promo2 gear carb
#Mazda RX4              1   NA   NA
#Mazda RX4 Wag          1    4    4
#Datsun 710             1    4    1
#Hornet 4 Drive         0    3    1
#Hornet Sportabout      0    3   NA
#Valiant                0    3    1

    for(i in 1:nrow(store)) {
      col <- which(is.na(store[i,]))
      store[i,][col] <- if(store$Promo2[i] == 1) colMeans(store[col], na.rm=TRUE) else 0
    }

store
#                  Promo2 gear carb
#Mazda RX4              1  3.4 1.75
#Mazda RX4 Wag          1  4.0 4.00
#Datsun 710             1  4.0 1.00
#Hornet 4 Drive         0  3.0 1.00
#Hornet Sportabout      0  3.0 0.00
#Valiant                0  3.0 1.00

数据

store <- head(mtcars)
store <- store[-(1:8)]
names(store)[1] <- "Promo2"
store[1,2] <- NA
store[5,3] <- NA
store[1,3] <- NA
store