创建一个提取某些行的自定义函数

时间:2015-11-18 15:07:49

标签: r dataframe aggregate

head(MYK)    
X Analyte Subject Cohort DayNominal HourNominal Concentration    uniqueID    FS    EF   VTI deltaFS deltaEF deltaVTI HR
2 MYK-461 005-010      1          1        0.25         31.00 005-0100.25 31.82 64.86  0.00       3      -1     -100 58
3 MYK-461 005-010      1          1        0.50         31.80  005-0100.5    NA    NA    NA      NA      NA       NA NA
4 MYK-461 005-010      1          1        1.00          9.69    005-0101 26.13 69.11  0.00     -15       6     -100 55
5 MYK-461 005-010      1          1        1.50          8.01  005-0101.5    NA    NA    NA      NA      NA       NA NA
6 MYK-461 005-010      1          1        2.00          5.25    005-0102    NA    NA    NA      NA      NA       NA NA
7 MYK-461 005-010      1          1        3.00          3.26    005-0103 29.89 60.99 23.49      -3      -7        9 55
105 MYK-461 005-033      2          1        0.25           3.4 005-0330.25 30.18 68.59 23.22       1       0       16 47
106 MYK-461 005-033      2          1        0.50          12.4  005-0330.5    NA    NA    NA      NA      NA       NA NA
107 MYK-461 005-033      2          1        0.75          27.1 005-0330.75    NA    NA    NA      NA      NA       NA NA
108 MYK-461 005-033      2          1        1.00          23.5    005-0331 32.12 69.60 21.06       7       2        5 43
109 MYK-461 005-033      2          1        1.50          16.8  005-0331.5    NA    NA    NA      NA      NA       NA NA
110 MYK-461 005-033      2          1        2.00          15.8    005-0332    NA    NA    NA      NA      NA       NA NA

organize = function(x, y) {

  g1 = subset(x, Cohort == y)
  g1 = aggregate(x[,'Concentration'], by=list(x[,'HourNominal']), FUN=mean)
  g1 = setNames(g1, c('HourNominal', 'Concentration'))
  g2 = aggregate(x[,'Concentration'], by=list(x[,'HourNominal']), FUN=sd)
  g2 = setNames(g2, c('HourNominal', 'SD'))
  g1[,'SD'] = g2$SD
  g1$top = g1$Concentration + g1$SD
  g1$bottom = g1$Concentration - g1$SD

  return(g1)
}

我这里有一个数据框,还有一些代码可以根据某个同类群组对数据框进行子集化,并根据小时聚合浓度。但是,所有数据帧看起来都一样。

CA1 = organize(MYK, 1)
CA2 = organize(MYK, 2)

然而,每当我使用这两个命令时,两个数据集都是相同的。

我想要一个看起来像

的数据集
   HourNominal Concentration         SD        top      bottom
1         0.25     27.287500  25.112204  52.399704   2.1752958
2         0.50     41.989722  32.856013  74.845735   9.1337094
3         0.75     49.866667  22.485254  72.351921  27.3814122
4         1.00    107.168889 104.612098 211.780987   2.5567908
5         1.50    191.766389 264.375466 456.141855 -72.6090774
6         1.75    319.233333 290.685423 609.918757  28.5479100
7         2.00    226.785278 272.983234 499.768512 -46.1979560
8         2.25    341.145833 301.555769 642.701602  39.5900645
9         2.50    341.145833 319.099679 660.245512  22.0461542
10        3.00    195.303333 276.530533 471.833866 -81.2271993
11        4.00    107.913889 140.251991 248.165880 -32.3381024
12        6.00     50.174167  64.700785 114.874952 -14.5266184
13        8.00     38.132639  47.099796  85.232435  -8.9671572
14       12.00     31.404444  39.667850  71.072294  -8.2634051
15       24.00     33.488583  41.267392  74.755975  -7.7788087
16       48.00     29.304833  38.233776  67.538609  -8.9289422
17       72.00      7.322792   6.548898  13.871690   0.7738932
18       96.00      7.002833   6.350251  13.353085   0.6525821
19      144.00      6.463875   5.612630  12.076505   0.8512452
20      216.00      5.007792   4.808156   9.815948   0.1996353
21      312.00      3.964727   4.351626   8.316353  -0.3868988
22      480.00      2.452857   3.220947   5.673804  -0.7680897
23      648.00      1.826625   2.569129   4.395754  -0.7425044

问题在于,为什么我尝试通过同类群分隔值,两个数据帧具有相同的内容。它们不应该完全相同。

0 个答案:

没有答案