考虑以下
mtcars.dt <- data.table(mtcars)
DT1 = mtcars.dt[, lapply(.SD, mean), by=cyl]
DT2 = mtcars.dt[, lapply(.SD, mean)]
现在,我们有以下值:
> DT1
cyl mpg disp hp drat wt qsec vs am gear carb
1: 6 19.74286 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286 0.4285714 3.857143 3.428571
2: 4 26.66364 105.1364 82.63636 4.070909 2.285727 19.13727 0.9090909 0.7272727 4.090909 1.545455
3: 8 15.10000 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000 0.1428571 3.285714 3.500000
和
> DT2
mpg cyl disp hp drat wt qsec vs am gear carb
1: 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125
现在,我希望DT1中每行的mpg,disp,...,用整个原始表的平均值(DT2
中提供)进行归一化。
我该怎么做?这里有什么正确的习语?
编辑:这是所需的输出,抱歉我以前不太清楚。
cyl mpg disp hp drat wt qsec vs am gear carb
1: 6 0.9826900 0.7945249 0.8336478 0.9969837 0.9688843 1.0071934 1.306122 1.0549451 1.0460048 1.2190476
2: 4 1.3271681 0.4556844 0.5633497 1.1318889 0.7104599 1.0721912 2.077922 1.7902098 1.1093991 0.5494949
3: 8 0.7515943 1.5304141 1.4262584 0.8978812 1.2430536 0.9396817 0.000000 0.3516484 0.8910412 1.2444444
答案 0 :(得分:2)
mapply('/',subset(DT1, select=-cyl), subset(DT2, select=-cyl))
但这只是dataframe-ly
答案 1 :(得分:2)
此处有更多data.table
ish解决方案,该解决方案使用高效set
功能(我在CRAN上使用最新的data.table
版本btw-1.9.6 )
创建DT1
library(data.table) # V 1.9.6+
mtcars.dt <- data.table(mtcars)
DT1 <- mtcars.dt[, lapply(.SD, mean), by = cyl]
现在创建DT2
,同时通过在cyl
参数中否定{<1}}列来避开.SDcols
列
DT2 <- unlist(mtcars.dt[, lapply(.SD, mean), .SDcols = -"cyl"])
现在循环遍历DT1
中的第二列并点击并更新DT1
到位,同时除以DT2
for (j in 2L:length(DT1)) set(DT1, j = j, value = DT1[[j]]/DT2[j - 1L])
DT1
# cyl mpg disp hp drat wt qsec vs am gear carb
# 1: 6 0.9826900 0.7945249 0.8336478 0.9969837 0.9688843 1.0071934 1.306122 1.0549451 1.0460048 1.2190476
# 2: 4 1.3271681 0.4556844 0.5633497 1.1318889 0.7104599 1.0721912 2.077922 1.7902098 1.1093991 0.5494949
# 3: 8 0.7515943 1.5304141 1.4262584 0.8978812 1.2430536 0.9396817 0.000000 0.3516484 0.8910412 1.2444444
答案 2 :(得分:2)
每行标准化然后由cyl聚合可以解决您的问题吗?
喜欢那样: mtcars.dt&lt; - data.table(mtcars)
# normalise by cyl
sdcol <- names(mtcars.dt)[names(mtcars.dt) != "cyl"]
res <- mtcars.dt[, lapply(.SD, function(x) x / mean(x)), .SDcols = sdcol]
res[, cyl := mtcars.dt[, cyl]]
# aggregate
res2 <- res[, lapply(.SD, mean), by = cyl]
或简短版本:
mtcars.dt[, lapply(.SD, function(x) x / mean(x)), .SDcols = sdcol][, cyl := mtcars.dt[, cyl]][, lapply(.SD, mean), by = cyl]