我想在数字列中更新具有该列中值的NA。
dt <- data.table(
name = c("A","B","C","D","E"),
sex = c("M","F",NA,"F","M"),
age = c(1,2,3,NA,4),
height = c(178.1, 162.1, NA, 169.5, 172.3)
)
提取数字列
num.cols <- sapply(dt, is.numeric)
num.cols <- names(num.cols)[num.cols]
检查值
median(dt[,age], na.rm = T) # 2.5
median(dt[,height], na.rm = T) #170.9
对每个num.cols使用lapply
dt[,lapply(.SD, function(value)
ifelse(is.na(value), median(value, na.rm=TRUE), value)),
.SDcols = num.cols]
问题,我无法弄清楚如何用data.table语法覆盖带有NA估算中位数矢量的矢量?
答案 0 :(得分:1)
我们可以使用na.aggregate
中的zoo
并将FUN
指定为median
,以median
为.SDcols
中为:=
中指定的选定列计算缺失值{1}}并将值(library(zoo)
dt[, (num.cols) := na.aggregate(.SD, FUN = median),.SDcols = num.cols]
dt
# name sex age height
#1: A M 1.0 178.1
#2: B F 2.0 162.1
#3: C NA 3.0 170.9
#4: D F 2.5 169.5
#5: E M 4.0 172.3
)分配给相关列
$host = 'wrong-server';
$dbase = 'db_name';
$user = 'my_user';
$pwd = 'my_pwd';
$connection='mysql: host='.$host.'; dbname='.$dbase;
$link = new PDO($connection , $user, $pwd);