如何使用lapply将值重新分配给data.table的现有列?

时间:2017-05-16 14:42:38

标签: r data.table lapply

我想在数字列中更新具有该列中值的NA。

dt <- data.table(
  name = c("A","B","C","D","E"),
  sex = c("M","F",NA,"F","M"),
  age = c(1,2,3,NA,4),
  height = c(178.1, 162.1, NA, 169.5, 172.3)
)

提取数字列

num.cols <-  sapply(dt, is.numeric)
num.cols <- names(num.cols)[num.cols]

检查值

median(dt[,age], na.rm = T) # 2.5
median(dt[,height], na.rm = T) #170.9

对每个num.cols使用lapply

dt[,lapply(.SD, function(value) 
ifelse(is.na(value), median(value, na.rm=TRUE), value)),
.SDcols = num.cols]

问题,我无法弄清楚如何用data.table语法覆盖带有NA估算中位数矢量的矢量?

1 个答案:

答案 0 :(得分:1)

我们可以使用na.aggregate中的zoo并将FUN指定为median,以median.SDcols中为:=中指定的选定列计算缺失值{1}}并将值(library(zoo) dt[, (num.cols) := na.aggregate(.SD, FUN = median),.SDcols = num.cols] dt # name sex age height #1: A M 1.0 178.1 #2: B F 2.0 162.1 #3: C NA 3.0 170.9 #4: D F 2.5 169.5 #5: E M 4.0 172.3 )分配给相关列

$host  = 'wrong-server';
$dbase = 'db_name';
$user  = 'my_user';
$pwd   = 'my_pwd';
$connection='mysql: host='.$host.'; dbname='.$dbase;
$link = new PDO($connection , $user, $pwd);