从数据帧中得出不同子集的均值

时间:2017-08-09 08:30:29

标签: r

我有一个包含大量行和列的数据框,其中包含不同的努力计算和30个人做6个不同活动的度量。

我想计算每个人和每项活动的每个变量的平均值,并将其汇总到表格中......

我的解决方案是制作两个循环并继续进行,但没有其他解决方案,更快,继续它...我最近发现了包,dplyr,tidyr,plyr和reshape2,我我想我可以用它来找到解决方案,但我找不到......

你能帮帮我吗?

 subject id_activity activity tBodyAcc-mean()-X tBodyAcc-mean()-Y tBodyAcc-mean()-Z tGravityAcc-mean()-X tGravityAcc-mean()-Y tGravityAcc-mean()-Z tBodyAccJerk-mean()-X tBodyAccJerk-mean()-Y tBodyAccJerk-mean()-Z
1        1           1  WALKING         0.2885845      -0.020294171       -0.13290514            0.9633961           -0.1408397           0.11537494            0.07799634           0.005000803         -0.0678308080
2        1           1  WALKING         0.2784188      -0.016410568       -0.12352019            0.9665611           -0.1415513           0.10937881            0.07400671           0.005771104          0.0293766330
3        1           1  WALKING         0.2796531      -0.019467156       -0.11346169            0.9668781           -0.1420098           0.10188392            0.07363596           0.003104037         -0.0090456308
4        1           1  WALKING         0.2791739      -0.026200646       -0.12328257            0.9676152           -0.1439765           0.09985014            0.07732061           0.020057642         -0.0098647722
5        1           1  WALKING         0.2766288      -0.016569655       -0.11536185            0.9682244           -0.1487502           0.09448590            0.07344436           0.019121574          0.0167799790
6        1           1  WALKING         0.2771988      -0.010097850       -0.10513725            0.9679482           -0.1482100           0.09190972            0.07793244           0.018684046          0.0093444336
7        1           1  WALKING         0.2794539      -0.019640776       -0.11002215            0.9679295           -0.1442821           0.09314463            0.08217077          -0.017014670         -0.0157981660
8        1           1  WALKING         0.2774325      -0.030488303       -0.12536043            0.9684915           -0.1467054           0.09170816            0.07236423           0.008747856         -0.0044681354
9        1           1  WALKING         0.2772934      -0.021750698       -0.12075082            0.9684812           -0.1543740           0.08511826            0.07528437           0.030762704          0.0112119500
10       1           1  WALKING         0.2805857      -0.009960298       -0.10606516            0.9684180           -0.1563020           0.08087447            0.07636932           0.012518906          0.0030843751
11       1           1  WALKING         0.2768803      -0.012721805       -0.10343832            0.9692027           -0.1523614           0.08125808            0.07139686           0.016842441          0.0010303821
12       1           1  WALKING         0.2762282      -0.021441302       -0.10820234            0.9692533           -0.1500638           0.08293121            0.07608451          -0.002311558         -0.0076736296
13       1           1  WALKING         0.2784570      -0.020414761       -0.11273172            0.9689963           -0.1523621           0.08315080            0.07710200           0.017027167         -0.0009852394
14       1           1  WALKING         0.2771750      -0.014712802       -0.10675647            0.9690440           -0.1541413           0.08181960            0.07761238           0.019489223          0.0152076830
15       1           1  WALKING         0.2979457       0.027093908       -0.06166812            0.9448949           -0.2926233          -0.02143552            0.06665616          -0.068367084         -0.0336076010

有10 299行和56列,我没有把你所有列,只是一个子集,看看它是怎么样的...抱歉我的英文^^

1 个答案:

答案 0 :(得分:1)

您可以尝试功能aggregate。它专为您正在寻找的东西而设计。

xy <- data.frame(subj = c(1,1,1,1,2,2,2,2),
                 act = c("a", "a", "b", "b", "a", "a", "b", "b"),
                 stat1 = rnorm(8),
                 stat2 = rnorm(8),
                 stat3 = rnorm(8))

xy
aggregate(. ~ subj + act, data = xy, FUN = mean)

  subj act       stat1      stat2      stat3
1    1   a  0.10244340  0.9175242 -0.1240974
2    2   a  0.06747905 -0.3221609  0.8647476
3    1   b -0.17143146  0.9971627  0.3603535
4    2   b -1.32023632  0.6584811  0.2126244

您还可以使用程序包data.table,它通常可以比某些基本R解决方案更快地执行操作。

library(data.table)
setDT(xy)
xy[, lapply(.SD, mean), by = .(subj, act)]

   subj act       stat1      stat2      stat3
1:    1   a  0.10244340  0.9175242 -0.1240974
2:    1   b -0.17143146  0.9971627  0.3603535
3:    2   a  0.06747905 -0.3221609  0.8647476
4:    2   b -1.32023632  0.6584811  0.2126244