如何通过基线时间点对科学数据进行双倍标准化,然后在R

时间:2017-07-12 14:12:06

标签: r data-analysis

我有一个data.table,有一堆参数(振幅,速率,面积等等,共有23个)属于特定的井(奇异实验,如果你愿意的话,总共有48个),按治疗分组(通常总共约10个),所有这些都在不同的时间点(可能有很多)。我想首先取每个井并按基线中位数参数(在“零”时间之前的所有时间点)对所有参数进行归一化(如,除以),然后取出标准化数据并再次标准化,但这次通过对照治疗组,每个时间点。我还想事先看一下基线和控制数据,并在必要时标记并删除异常值,然后进行标准化(虽然这在目前并不是非常重要;一旦我意识到如何完成,我可能会想到这一点归一化)

作为一个例子,我将创建一个类似的data.table,用于我在原始仪器数据分析代码中生成的内容:

dt = data.table(
  wellID = as.factor(c ("A4", "B4", "C5", "D5", "A4", "B4", "C5", "D5","A4", 
  "B4", "C5", "D5")),
  treatment = as.factor (c ("Control", "Control", "Drug", "Drug", "Control", 
  "Control", "Drug", "Drug", "Control", "Control", "Drug", "Drug")),
  time_h = c (-0.2, -0.2, -0.2, -0.2, -0.1, -0.1, -0.1, -0.1, 4, 4, 4, 4),
  area = runif (12, min = 0.5, max = 0.9),
  amp = runif (12, min = 0.1, max = 0.2),
  rate = runif (12, min = 33, max = 38)
)

我尝试过这样的事情:

baseline = subset (dt, subset = time_h < 0 )

隔离基线时间点,然后:

base_medians = by (baseline [ , (4: ncol (baseline)) ], baseline$ wellID, 
           function (x) {
             apply (x, 2, median)
           })

获取每口井的基线中位数,但是我真的不知道如何规范化dt中的数据以使井和参数匹配,然后进行第二次归一化?

我不认为这是一个好的策略,我应该以某种方式解构和重建我的数据集吗?

感谢任何帮助!

1 个答案:

答案 0 :(得分:1)

这可能需要对子集进行一些调整,如果这不是您正在寻找的。这会将参数列除以time_h < 0时的中位数值,然后再划分treatment == "Control"

set.seed(21)  #good practice for questions so results are reproducible

parm <- c("area", "amp", "rate")  #parameters to include
dt[, (parm) := lapply(.SD, function(x) x / median(x[time_h < 0])), .SDcols = parm]
dt[, (parm) := lapply(.SD, function(x) x / median(x[treatment == "Control"])), .SDcols = parm]

    wellID treatment time_h      area       amp      rate
 1:     A4   Control   -0.2 0.9541129 0.7538275 0.9403151
 2:     B4   Control   -0.2 0.7040382 1.1530667 1.0081769
 3:     C5      Drug   -0.2 0.9134096 0.8369863 0.9780808
 4:     D5      Drug   -0.2 0.6721809 0.7392173 1.0067250
 5:     A4   Control   -0.1 1.0354136 1.0865999 0.9978287
 6:     B4   Control   -0.1 1.0162338 0.9134001 0.9918002
 7:     C5      Drug   -0.1 0.6334486 1.0678871 1.0280474
 8:     D5      Drug   -0.1 0.6664317 1.1639014 0.9696164
 9:     A4   Control    4.0 1.0477798 0.7204991 1.0021713
10:     B4   Control    4.0 0.9837662 1.1454020 1.0149003
11:     C5      Drug    4.0 0.8985494 1.2648977 1.0190920
12:     D5      Drug    4.0 1.0239782 1.3705952 0.9268626