Question

I am using a large dataset and I am not used to using one this big (286,212 rows, 19 columns) and I am not sure how to go about my problem. the data is made up of values for each day of the year for 782 grid references and I have this for 15 years. It looks as follows

**Month  Day  Grid   x2004    x2005    x2006     x2007**
 1       1    A10    0.091   0.134     NA       0.066
 1       2    A10    0.12    0.10      0.23     0.054
 1       3    A10    0.55    NA        NA       0.08
 1       1    B10    NA      0.134     NA       0.17
 1       2    B10    0.14    0.151     NA       0.21
 1       3    B10    0.43    0.162     0.24      NA

However some of the days are missing and I want to insert the mean of that day for that specific grid using values from the other years. So if the Grid A10 for day 1 in 2006 is missing. I want to insert the mean for day 1 grid A10 from 2004, 2005, 2007, in this case 0.097.

I am trying the following code

ind <- which(is.na(data$x2005))
data$x2005[ind] <- sapply(ind, function(i) 
with(data, rowMeans(data[c(data$x2004[i], data$x2006[i], data$x2007[i], data$x2008[i], data$x2009[i],
data$x2010[i], data$x2011[i], data$x2012[i], 
data$x2013[i], data$x2014[i], data$x2015[i], 
data$x2016[i], data$x2017[i]),], na.rm=TRUE)))

and I plan to do that for all years but it is telling me

"Error in rowMeans(data[c(data$x2006[i], data$x2007[i], data$x2012[i]),  : 
  'x' must be numeric"

Although when I check class, it says that they are all numeric, so I am not sure why x is not numeric. I also don't know if even when i get the mean part sorted, if the code will work so that I am getting the mean specific to each grid and day.

Please Help. Thanks

Answer 1

Can you adapt this to your code:

for(i in 1:ncol(data)){ data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE) }

Replace NA with mean value for specific day and grid

1 个答案: