Question

我有一个大约50列和10000行的数据框。 cloumn中的值范围为+4到-4。我不希望将正值计算在我的平均值或总和上。我想根据以下值进行平均值和总和。

平均值1 = -3.00至-4.00，平均值2 = -2.00至-2.99，平均值3 = -1.00至-1.99 平均值4 = -.0001到-0.99，总和相同

我的数据如下：

Col1    Col2    Col3    Col4 

-3.5146 -3.4556 -3.3418 -3.5318 
-3.2025 -3.514  -3.4787 -3.2389 
-3.4703 -3.4061 -3.3073 -3.4863 
-2.2589 -2.1041 -2.3988 -3.5074 
-3.4982 -3.4511 -3.4022 -3.515 
-3.5201 -3.4755 -3.3262 -3.5015 
-2.2487 -2.2279 -1.8281 -3.2417 
-3.4636 -3.4139 -3.4265 -3.394 
-3.4915 -3.4403 -3.3892 -3.496 
-3.485  -3.4292 -3.3462 -3.4519 
-3.3267 -3.413  -3.5319 -3.4853 
-3.4287 -3.3736 -3.3321 -3.4794 
-2.1028 -1.7846 -1.7041 -3.1253 
-3.441  -3.3989 -3.2965 -3.4871 
-3.659  -3.6224 -3.5749 -3.5581 
-1.9703 -2.2392 -2.1001 -2.0202 
-2.0637 -2.1758 -2.013  -1.845 
-1.2338 -2.1306 -2.122  -2.16 
-2.9466 -2.5278 -0.7644 -0.2727 
-2.0842 -2.2125 -1.9598 -1.8279 
-2.2658 -2.2649 -2.0052 -2.2962 
-2.0647 -2.1666 -1.9974 -1.8078

我尝试了以下代码的平均值，但我不知道如何修改代码供我使用：

data=read.table('probability.csv',header=T, sep=',') 
frame=data.frame(data)  
averages=apply(frame,2,FUN=mean,na.rm=T)

我如何修改代码，以便我可以获得我想要的平均值作为上面显示的范围。谢谢加文

Answer 1

您可以执行以下操作：

#specify a maths function 
#instead of making two separate functions one for the mean and one for the
#sum, we can specify one with a function argument (which will be either sum or mean).
#a,b,c,d represent the bins you want. They subset the data as such and then
#calculate means and sums per column
mymaths <- function(f){
  function(x){
        a <- f(x[x > -4    & x <= -3])
        b <- f(x[x > -2.99 & x <= -2])
        c <- f(x[x > -1.99 & x <= -1])
        d <- f(x[x > -0.99 & x <= -0.0001])
        c(a,b,c,d)
        }
}

输出：

#two simple lapply functions will give you what you want:

> lapply(df, mymaths(mean))
$Col1
[1] -3.458433 -2.254425 -1.602050       NaN

$Col2
[1] -3.449467 -2.227711 -1.784600       NaN

$Col3
[1] -3.396125 -2.127820 -1.830667 -0.764400

$Col4
[1] -3.433313 -2.158800 -1.826900 -0.272700


> lapply(df, mymaths(sum))
$Col1
[1] -41.5012 -18.0354  -3.2041   0.0000

$Col2
[1] -41.3936 -20.0494  -1.7846   0.0000

$Col3
[1] -40.7535 -10.6391  -5.4920  -0.7644

$Col4
[1] -51.4997  -6.4764  -5.4807  -0.2727

在给定值范围内按列平均值

1 个答案: