有下标时从公式中提取变量

时间:2015-06-11 03:16:05

标签: r string parsing

在R中的回归公式中有几个与获取变量列表相关的帖子 - 基本答案是使用all.vars。例如,

> all.vars(log(resp) ~ treat + factor(dose))
[1] "resp"  "treat" "dose"

这很好,因为它删除了所有函数和运算符(以及重复,未显示)。但是,当公式包含$运算符或下标(例如

)时,这会出现问题
> form = log(cows$weight) ~ factor(bulls[[3]]) * herd$breed
> all.vars(form)
[1] "cows"   "weight" "bulls"  "herd"   "breed"

此处,数据框名称cowsbullsherd被标识为变量,实际变量的名称被解耦或丢失。相反,我真正想要的是这个结果:

> mystery.fcn(form)
[1] "cows$weight" "bulls[[3]]"  "herd$breed"

最优雅的方法是什么?我有一个提案,我会发布作为答案,但也许有人有一个更优雅的解决方案,并将获得更多的选票!

2 个答案:

答案 0 :(得分:2)

一种方法虽然有点单调乏味,但是用变量名称的合法字符替换运算符Initialize previous.status; Initialize current.status; SET previous.status; BEGIN loop SET current.status; IF previous.status = ERROR and current.status then UPDATE test1 END IF previous.status = current.status; END loop 等,将字符串转换回公式,应用$和un - 结果:

all.vars

使用All.vars = function(expr, retain = c("\\$", "\\[\\[", "\\]\\]"), ...) { # replace operators with unlikely patterns _Av1_, _Av2_, ... repl = paste("_Av", seq_along(retain), "_", sep = "") for (i in seq_along(retain)) expr = gsub(retain[i], repl[i], expr) # piece things back together in the right order, and call all.vars subs = switch(length(expr), 1, c(1,2), c(2,1,3)) vars = all.vars(as.formula(paste(expr[subs], collapse = "")), ...) # reverse the mangling of names retain = gsub("\\\\", "", retain) # un-escape the patterns for (i in seq_along(retain)) vars = gsub(repl[i], retain[i], vars) vars } 参数指定我们希望保留的模式,而不是将其视为运算符。默认值为retain$[[(均已正常转义)以下是一些结果:

]]

> form = log(cows$weight) ~ factor(bulls[[3]]) * herd$breed > All.vars(form) [1] "cows$weight" "bulls[[3]]" "herd$breed" 更改为还包括retain(

)

这些点传递给> All.vars(form, retain = c("\\$", "\\(", "\\)", "\\[\\[", "\\]\\]")) [1] "log(cows$weight)" "factor(bulls[[3]])" "herd$breed" ,这与all.vars实际上相同,但默认值不同。所以我们也可以获得不在all.names

中的函数和运算符
retain

答案 1 :(得分:2)

这对于一般用例来说还不够,但只是为了好玩我以为我会对它采取一些措施:

mystery.fcn = function(string) {
  string = gsub(":", " ", string)
  string = unlist(strsplit(gsub("\\b.*\\b\\(|\\(|\\)|[*~+-]", "", string), split=" "))
  string = string[nchar(string) > 0]
  return(string)
}

form = log(cows$weight) ~ factor(bulls[[3]]) * herd$breed
mystery.fcn(form)
[1] "cows$weight" "bulls[[3]]"  "herd$breed" 

form1 = ~x[[y]]
mystery.fcn(form1)
[1] "x[[y]]"

form2 = z$three ~ z$one + z$two - z$x_y
mystery.fcn(form2)
[1] "z$three" "z$one"   "z$two"   "z$x_y"  

form3 = z$three ~ z$one:z$two
mystery.fcn(form3)
[1] "z$three" "z$one"   "z$two"