I'm trying to calculate the mean and sd of certain columns in a data.frame. I run the following loop:
calculations <- by(dataset, dataset$id, function (x)
{
if(x == 1)
{
c(mean(x$Var1),mean(x$Var2))
print("Cannot take sd, number of obs is equal to 1")
}
else(x > 1)
{
c(mean(x$Var1), mean(x$Var2), sd(x$Var1), sd(x$Var2))
}
#return(c(mean(x$Var1), mean(x$Var2), sd(x$Var1), sd(x$Var2)))
})
and I get an output of:
dataset$id: 1
[1] 0.3961182 3.8605641 1.0251303 2.6779033
--------------------------------------------------------------------------
dataset$id: 2
[1] 0.1656521 3.7565732 0.8687900 2.2305298
--------------------------------------------------------------------------
dataset$id: 3
[1] -0.3831954 4.0803145 1.3875692 2.1146944
--------------------------------------------------------------------------
dataset$id: 4
[1] 0.6719857 4.7523648 0.2001029 1.3715562
--------------------------------------------------------------------------
dataset$id: 5
[1] 0.01666328 3.18141270 0.98473329 1.76379804
--------------------------------------------------------------------------
dataset$id: 6
[1] 0.2542346 4.6464406 1.1077001 2.4604031
--------------------------------------------------------------------------
dataset$id: 7
[1] -0.1826018 5.6737908 NA NA
up to dataset$id 40. with my loop I want the NA's to print off "Cannot take sd." When I run my code I just end up with the following error message:
Warning messages:
1: In if (x == 1) { ... :
the condition has length > 1 and only the first element will be used
2: In if (x == 1) { ... :
the condition has length > 1 and only the first element will be used
3: In if (x == 1) { ... :
the condition has length > 1 and only the first element will be used
Does anyone know how to fix this?
答案 0 :(得分:0)
It is clear from
by(dataset, dataset$id, function (x) ....
that x
takes subsets of rows of datatset
. Your code then says
if (x == 1)
and as x
is a data frame what do you hope to achieve by testing if the data frame is equal to 1?
If you are testing for the number of rows then consider
if (nrow(x) > 1L)
as the test and flip the content of the if
and else
branches or do
if (nrow(x) < 2L) ## or
if (nrow(x) == 1L)
if you don't want to flip the branches.