我有一个看起来像这样的表:
modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2))
我想计算每个模型中除截距,月份,ratediff之外的变量数。我想要的输出是:
modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2), variables = c(3,3,3,3,3,3,3,3,3,3,3,3))
我尝试使用以下方法获取标志:
modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1,
function(x) sum(!(x %in% c(grep("month", x), "RateDiff")), na.rm = T))
但grep(month)
不起作用。
modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1,
function(x) sum(!(x %in% c("month", "RateDiff")), na.rm = T))
这有效,但是没有捕获后跟月份的月份。
我想要从变量截距,月份和RateDiff的sql中获取与〜ilike〜相同的东西,因为我不希望它区分大小写,并且希望在变量上使用后缀和前缀。我该如何实现?
答案 0 :(得分:2)
这是dplyr
的一种方式-
modelsummary %>%
mutate(
variables = term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>%
n_distinct()
)
term mod_id variables
1 (Intercept) 1 3
2 month1 1 3
3 month2 1 3
4 RateDiff 1 3
5 var1 1 3
6 var2 1 3
7 var3 1 3
8 (Intercept) 2 3
9 month1 2 3
10 var1 2 3
11 var2 2 3
12 var3 2 3
或使用dplyr
和stringr
:
modelsummary %>%
mutate(
variables = str_subset(tolower(term), "intercept|month|ratediff", TRUE) %>%
n_distinct()
)
如果要计算每个group_by(mod_id)
的变量数,请在mutate
之前添加mod_id
。
在基数R中-
modelsummary$variables <- with(modelsummary,
term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>%
unique() %>% length()
)