替换缺失值

时间:2018-09-06 19:37:21

标签: r dataframe missing-data

M     Price Quantity    Quantity1
---------------------------------
2014m1  55  150          150
2014m2  55  220          220
2014m3  55  350          87,5
2014m4  55  NA           87,5
2014m5  55  NA           87,5
2014m6  55  NA           87,5
2014m8  58  200           200

这是我桌子的样品。即我想得到类似Quantity1的结果。这意味着如果某个值为NA,则代码应除以NA的数量加1。

例如,应将350替换为87.5(= 350/4),并且接下来的三个值也应替换为87.5。

那么有人可以通过循环帮助我吗?

2 个答案:

答案 0 :(得分:4)

对于Base R,我们可以使用ave

df$Quantity1 = ave(df$Quantity, cumsum(!is.na(df$Quantity)), 
                   FUN = function(x) max(x, na.rm = TRUE)/length(x))

此外,还有data.table(贷记@Jaap):

library(data.table)

setDT(df)[, Quantity1 := max(Quantity, na.rm = TRUE)/.N, by = cumsum(!is.na(Quantity))]

输出:

       M Price Quantity Quantity1
1 2014m1    55      150     150.0
2 2014m2    55      220     220.0
3 2014m3    55      350      87.5
4 2014m4    55       NA      87.5
5 2014m5    55       NA      87.5
6 2014m6    55       NA      87.5
7 2014m8    58      200     200.0

或使用dplyr

library(dplyr)

df %>%
  group_by(na_id = cumsum(!is.na(Quantity))) %>%
  mutate(Quantity1 = max(Quantity, na.rm = TRUE)/n()) 

注意:我们可以添加ungroup() %>% select(-na_id)来删除na_id列。

输出:

# A tibble: 7 x 5
# Groups:   na_id [4]
  M      Price Quantity na_id Quantity1
  <fct>  <int>    <int> <int>     <dbl>
1 2014m1    55      150     1     150  
2 2014m2    55      220     2     220  
3 2014m3    55      350     3      87.5
4 2014m4    55       NA     3      87.5
5 2014m5    55       NA     3      87.5
6 2014m6    55       NA     3      87.5
7 2014m8    58      200     4     200  

数据:

df <- structure(list(M = structure(1:7, .Label = c("2014m1", "2014m2", 
"2014m3", "2014m4", "2014m5", "2014m6", "2014m8"), class = "factor"), 
    Price = c(55L, 55L, 55L, 55L, 55L, 55L, 58L), Quantity = c(150L, 
    220L, 350L, NA, NA, NA, 200L)), class = "data.frame", row.names = c(NA, 
-7L), .Names = c("M", "Price", "Quantity"))

答案 1 :(得分:0)

我认为,以下代码适合您:

getValueindices<-function(dt){which( is.na(dt))-1  } #find replace candidate


setValue<-function(indices,dt ){            # replace Na with previous value
  for(i in indices)
    if(min(indices)==i)
      dt[i+1]<-dt[i]/(sum(is.na(dt))+1)
    else
      dt[i+1]<-dt[i]
  dt
} 

getValueindices(df$Quantity)
setValue(indices,df$Quantity)

df$Quantity1<- setValue(indices,df$Quantity)

df

,输出为:

       M Price Quantity Quantity1
1 2014m1    55      150     150.0
2 2014m2    55      220     220.0
3 2014m3    55      350     350.0
4 2014m4    55       NA      87.5
5 2014m5    55       NA      87.5
6 2014m6    55       NA      87.5
7 2014m8    58      200     200.0