我正在尝试根据对照组(dx = 1)均值和标准差创建z得分变量。我想循环遍历所有感兴趣的变量(PCT:CST.L)来创建这些z分数。我该怎么办?这是我的数据。
X dx PCT CST.R CST.L
1 1 Control 15 30 5
2 2 Control 20 24 22
3 3 Clinical 25 20 14
4 4 Control 17 13 12
5 5 Clinical 14 12 11
6 6 Control 13 20 15
此外,我希望它看起来像(至少是标题)。
X dx PCT CST.R CST.L PCT_Z CST.R_Z CST.L_Z
1 1 Control 15 30 5
2 2 Control 20 24 22
3 3 Clinical 25 20 14
4 4 Control 17 13 12
5 5 Clinical 14 12 11
6 6 Control 13 20 15
数据
structure(list(X = 1:6, dx = c("Control", "Control", "Clinical",
"Control", "Clinical", "Control"), PCT = c(15L, 20L, 25L, 17L,
14L, 13L), CST.R = c(30L, 24L, 20L, 13L, 12L, 20L), CST.L = c(5L,
22L, 14L, 12L, 11L, 15L)), .Names = c("X", "dx", "PCT", "CST.R",
"CST.L"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6"))
答案 0 :(得分:0)
我假设您要计算每列的Z分数。
Z分数计算为(X - mean) / Standard deviation
。
X
这里将是所选列中的每一行。
> df = data.frame(X = 1:6,
+ dx = c("Control", "Control", "Clinical", "Control", "Clinical", "Control"),
+ PCT = c(15L, 20L, 25L, 17L, 14L, 13L),
+ CST.R = c(30L, 24L, 20L, 13L, 12L, 20L),
+ CST.L = c(5L, 22L, 14L, 12L, 11L, 15L))
> df
X dx PCT CST.R CST.L
1 1 Control 15 30 5
2 2 Control 20 24 22
3 3 Clinical 25 20 14
4 4 Control 17 13 12
5 5 Clinical 14 12 11
6 6 Control 13 20 15
>
> colsToCalculate = colnames(df[, 3:5])
> newCols = c('PCT_Z', 'CST.R_Z', 'CST.L_Z')
>
> for (i in seq(newCols)) {
+ data = df[, colsToCalculate[i]]
+ df[, newCols[i]] = (data - mean(data)) / sd(data)
+ }
>
> df
X dx PCT CST.R CST.L PCT_Z CST.R_Z CST.L_Z
1 1 Control 15 30 5 -0.51830527 1.50280954 -1.4675659
2 2 Control 20 24 22 0.59234888 0.61590555 1.5873672
3 3 Clinical 25 20 14 1.70300302 0.02463622 0.1497516
4 4 Control 17 13 12 -0.07404361 -1.01008510 -0.2096523
5 5 Clinical 14 12 11 -0.74043610 -1.15790243 -0.3893542
6 6 Control 13 20 15 -0.96256693 0.02463622 0.3294536
>