我有一本关于统计的书(使用R),显示以下内容:
> pima$diastolic [pima$diastolic = = 0] <- NA
> pima$glucose [pima$glucose == 0] <- NA
> pima$triceps [pima$triceps == 0] <- NA
> pima$insulin [pima$insulin == 0] <- NA
> pima$bmi [pima$bmi == 0] <- NA
有没有办法在一行或更高效地完成?我看到有一些函数,比如with,apply,subs,用于做类似的东西,但是无法弄清楚如何将它们组合在一起......
示例数据(如何将其作为数据帧读取(如pythons stringio):
pregnant glucose diastolic triceps insulin bmi diabetes age test
1 6 148 72 35 0 33.6 0.627 50 positive
2 1 85 66 29 0 26.6 0.351 31 negative
3 8 183 64 0 0 23.3 0.672 32 positive
4 1 89 66 23 94 28.1 0.167 21 negative
5 0 137 40 35 168 43.1 2.288 33 positive
6 5 116 74 0 0 25.6 0.201 30 negative
答案 0 :(得分:7)
这样的事情:
newUrl
为每列使用函数试试这个:
lapply()
或预定义列
pima[] <- lapply(pima, function(x){ if(is.numeric(x)) x[x==0] <- NA else x})
或使用cols = c("diastolic", "glucose", "triceps", "insulin", "bmi")
pima[cols] <- lapply(pima[cols], function(x) {x[x==0] <- NA ; x})
is.na<-
答案 1 :(得分:0)
使用data.table,您可以尝试
for (col in c("diastolic","glucose","triceps","insulin", "bmi")) pima[(get(col))==0, (col) := NA]
这里有更多细节: How to replace NA values in a table *for selected columns*? data.frame, data.table enter link description here
答案 2 :(得分:0)
使用dplyr
,您可以:
# banal function definition
zero_to_NA <- function(col) {
# any code that works here
# I chose this because it is concise and efficient
`is.na<-`(col, col==0)
}
# Assuming you want to change 0 to NA only in these 3 columns
pima <- pima %>%
mutate_each(funs(zero_to_NA), diastolic, glucose, triceps)
或者你可以跳过函数定义并直接写:
pima <- pima %>%
mutate_each(funs(`is.na<-`(., .==0)),
diastolic, glucose, triceps)