Question

I have a dataframe d like this:

ID  Value1  Value2  Value3
1   20      25      0
2   2       0       0
3   15      32      16
4   0       0       0

What I would like to do is calculate the variance for each person (ID), based only on non-zero values, and to return NA where this is not possible.

So for instance, in this example the variance for ID 1 would be var(20, 25), for ID 2 it would return NA because you can't calculate a variance on just one entry, for ID 3 the var would be var(15, 32, 16) and for ID 4 it would again return NULL because it has no numbers at all to calculate variance on.

How would I go about this? I currently have the following (incomplete) code, but this might not be the best way to go about it:

len=nrow(d)
variances = numeric(len)
for (i in 1:len){
  #get all nonzero values in ith row of data into a vector nonzerodat here
  currentvar = var(nonzerodat)
  Variances[i]=currentvar
}

Note this is a toy example, but the dataset I'm actually working with has over 40 different columns of values to calculate variance on, so something that easily scales would be great.

Answer 1

Data <- data.frame(ID = 1:4, Value1=c(20,2,15,0), Value2=c(25,0,32,0), Value3=c(0,0,16,0))

var_nonzero <- function(x) var(x[!x == 0])
apply(Data[, -1], 1, var_nonzero)

[1] 12.5   NA 91.0   NA

Answer 2

This seems overwrought, but it works, and it gives you back an object with the ids attached to the statistics:

library(reshape2)
library(dplyr)

variances <- df %>%
    melt(., id.var = "id") %>%
    group_by(id) %>%
    summarise(variance = var(value[value!=0]))

Here's the toy data I used to test it:

df <- data.frame(id = seq(4), X1 = c(3, 0, 1, 7), X2 = c(10, 5, 0, 0), X3 = c(4, 6, 0, 0))
> df
  id X1 X2 X3
1  1  3 10  4
2  2  0  5  6
3  3  1  0  0
4  4  7  0  0

And here's the result:

  id variance
1  1 14.33333
2  2  0.50000
3  3       NA
4  4       NA

R: Find the Variance of all Non-Zero Elements in Each Row

2 个答案: