为这个有点麻烦的问题道歉,但我目前正致力于心理健康研究。对于其中一个心理健康筛查工具,有15个变量,每个变量的值可以为0-3。然后通过获取这15个变量的总和来分配每行/参与者的总分。此工具的文档指出,如果缺少特定行/参与者的超过20%的值,则总得分也应视为缺失,但如果缺少少于20%的行,则每个应为缺失值指定该行剩余值的平均值。
我决定要这样做,我必须计算每个参与者的NA的比例,计算每个参与者排除NA的所有15个变量的平均值,然后使用检查是否的条件变异语句(或类似的东西)在找到每行的所有15个变量的总和之前,NAs的比例小于20%并且如果这样替换具有该行的平均值的相关列的NAs。除了这15个数据集之外,数据集还包含其他列,因此将函数应用于所有列将没有用。
为了计算没有NA的平均分数,我做了以下几点:
mental$somatic_mean <- rowMeans(mental [, c("var1", "var2", "var3",
"var4", "var5", "var6", "var7", "var8", "var9", "var10", "var11",
"var12","var13", "var14", "var15")], na.rm=TRUE)
计算每个变量的NA比例:
mental$somatic_na <- rowMeans(is.na(mental [, c("var1", "var2",
"var3", "var4", "var5", "var6", "var7", "var8", "var9", "var10", "var11",
"var12", "var13", "var14", "var15")]))
然而,当我尝试使用mutate()语句来改变少于20%的值为NA的行时,我无法识别任何有效的代码。到目前为止,我已经尝试了很多排列,包括每个变量的以下内容:
mental_recode <- mental %>%
rowwise() %>%
mutate(var1 = if(somatic_na<0.2)
replace_na(list(var1= somatic_mean)))
返回:
"no applicable method for 'replace_na' applied to an object of class "list""
尝试在不使用mutate()的情况下一起完成它们:
mental %>%
rowwise() %>%
if(somatic_na<0.2)
replace_na(list(var1 = somatic_mean, var2=
somatic_mean, var3 = somatic_mean, var4 = somatic_mean, var5 =
somatic_mean, var6 = somatic_mean, var7 = somatic_mean, var8 =
somatic_mean, var9 = somatic_mean, var10 = somatic_mean, var11 =
somatic_mean, var12 = somatic_mean, var13 = somatic_mean, var14 =
somatic_mean, var15 = somatic_mean ))
返回:
Error in if (.) somatic_na < 0.2 else replace_na(mental, list(var1 = somatic_mean, :
argument is not interpretable as logical
In addition: Warning message:
In if (.) somatic_na < 0.2 else replace_na(mental, list(var1 = somatic_mean, :
the condition has length > 1 and only the first element will be used
我还尝试将if_else()与mutate()结合使用,如果条件不满足则将值设置为NA,但在各种排列和错误消息之后无法使其工作。
编辑:虚拟数据可以通过以下方式生成:
mental <- structure(list(id = 1:21, var1 = c(0L, 0L, 1L, 1L, 1L, 0L, 0L,
NA, 0L, 0L, 0L, 0L, 0L, 0L, NA, 0L, 0L, 0L,
0L, 0L, 0L), var2 = c(0L,
0L, 1L, 1L, 1L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
2L, 0L, 1L, 1L), var3 = c(0L, 0L, 0L, 1L, 1L, 0L, 1L, 2L, 1L,
1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 2L, 0L, 1L, 1L), var4 = c(1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, NA, 0L, 0L, 0L,
0L, 1L, 0L, 0L), var5 = c(0L, 0L, 0L, 1L, NA, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), var6 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), var7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L, NA, 0L), var8 = c(0L,
0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), var9 = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L), var10 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, NA, 0L, 0L, 0L,
0L, 0L, NA, 0L), var11 = c(1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, NA, 0L), var12 = c(1L,
0L, 1L, 1L, NA, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
1L, 0L, 1L, 1L), var13 = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L,
0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, NA, 0L), var14 = c(1L,
0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L,
2L, 0L, 1L, 0L), var15 = c(1L, 0L, 2L, NA, NA, 0L, NA, 0L, 0L,
0L, 0L, 0L, NA, NA, 0L, NA, NA, NA, NA, NA, 0L)), .Names = c("id",
"var1", "var2", "var3", "var4", "var5", "var6", "var7", "var8",
"var9", "var10", "var11", "var12", "var13", "var14", "var15"), class =
"data.frame", row.names = c(NA,
-21L))
有没有人知道适用于这种情况的代码?
提前致谢!
答案 0 :(得分:1)
以下是使用您提供的数据框使用dplyr
在一个链中完成所有操作的方法。
首先创建一个感兴趣的所有列名称的向量:
name_col <- colnames(mental)[2:16]
现在使用dplyr
library(dplyr)
mental %>%
# First create the column of row means
mutate(somatic_mean = rowMeans(.[name_col], na.rm = TRUE)) %>%
# Now calculate the proportion of NAs
mutate(somatic_na = rowMeans(is.na(.[name_col]))) %>%
# Create this column for filtering out later
mutate(somatic_usable = ifelse(somatic_na < 0.2,
"yes", "no")) %>%
# Make the following replacement on a row basis
rowwise() %>%
mutate_at(vars(name_col), # Designate eligible columns to check for NAs
funs(replace(.,
is.na(.) & somatic_na < 0.2, # Both conditions need to be met
somatic_mean))) %>% # What we are subbing the NAs with
ungroup() # Now ungroup the 'rowwise' in case you need to modify further
现在,如果您只想选择NAs少于20%的条目,您可以将上述内容输入以下内容:
filter(somatic_usable == "yes")
另外值得注意的是,如果您想要使条件小于或等于 20%,则需要将somatic_na < 0.2
替换为somatic_na <= 0.2
。< / p>
希望这有帮助!
答案 1 :(得分:0)
这是一种仅使用基本R
表达式并记住总和的数学属性的方法:
# generate fake data
set.seed(123)
dat <- data.frame(
ID = 1:10,
matrix(sample(c(0:3, NA), 10 * 15, TRUE), nrow = 10, ncol = 15),
'another_var' = 'foo',
'second_var' = 'bar',
stringsAsFactors = FALSE
)
var_names <- paste0('X', 1:15)
# add number of NAs to data
dat$na_num <- rowSums(is.na(dat[var_names]))
# add row sum
dat$row_sum <- rowSums(dat[var_names], na.rm = TRUE)
# add row mean
dat$row_mean <- rowMeans(dat[var_names], na.rm = TRUE)
# add final sum
dat$final_sum <- dat$row_sum + dat$row_mean * dat$na_num
# recode final sum to be NA if prop > .2
dat$final_sum <- ifelse(rowMeans(is.na(dat[var_names])) > .2,
NA,
dat$final_sum)
这是一个做同样事情的功能。在哪里指定data
,然后指定变量名称的字符向量。
total_sum_calculation <- function(data, var_names){
# add number of NAs to data
na_num <- rowSums(is.na(data[var_names]))
# add row sum
row_sum <- rowSums(data[var_names], na.rm = TRUE)
# add row mean
row_mean <- rowMeans(data[var_names], na.rm = TRUE)
# add final sum
final_sum <- row_sum + row_mean * na_num
# recode final sum to be NA if prop > .2
ifelse(rowMeans(is.na(data[var_names])) > .2,
NA,
final_sum)
}
v_names <- paste0('var', 1:15)
total_sum_calculation(data = mental, var_names = v_names)
[1] 6.000000 0.000000 8.000000 7.500000 NA 0.000000 3.214286 9.230769 6.000000 2.000000 1.000000 0.000000 4.285714
[14] NA 5.357143 5.357143 5.357143 9.642857 1.071429 NA 3.000000