Question

word <- c('abc noboby@stat.berkeley.edu','text with no email','first me@mything.com also you@yourspace.com')
pattern <- '[-A-Za-z0-9_.%]+@[-A-Za-z0-9_.%]+\\.[A-Za-z]+'


getmail<-function(pattern,word){
mail<<-c()
sapply(word,function(x){
out<-gregexpr(pattern,x)
for (i in 1:length(out[[1]])){
if (out[[1]][i]>0)
mail<<-union(mail,substr(x,start=out[[1]][i],stop=out[[1]][i]+attr(out[[1]],"match.length")[i]-1))
}})
return(mail)
}

getmail(pattern,word)

[1] "noboby@stat.berkeley.edu" "me@mything.com"           "you@yourspace.com"       
ls()
[1] "getmail" "mail"    "pattern" "word"

该函数获取结果，但我觉得如果在运行getmail（pattern，word）后命名空间中没有全局变量mail会更好，我该如何修改它？不要删除sapply函数，按我的方式执行，只是不要让命名空间中的mail。

我知道我可以更简单地获得结果，但我想学习更多关于功能的内容。

mail<-c()
out<-gregexpr(pattern,word)
for (i in 1:length(out)){
  for (j in 1:length(out[[i]])){
    if (out[[i]][j]>0)
    mail<-union(mail,substr(word[i],start=out[[i]][j],stop=out[[i]][j]+attr(out[[i]],"match.length")[j]-1))}}
mail
[1] "noboby@stat.berkeley.edu" "me@mything.com"           "you@yourspace.com"

Answer 1

我可能会利用矢量化并跳过大部分循环：

> m <- gregexpr(pattern,word)
> lapply(seq_along(word),
         function(i){substring(word[i],m[[i]],m[[i]] + attr(m[[i]],"match.length"))})
[[1]]
[1] "noboby@stat.berkeley.edu"

[[2]]
[1] ""

[[3]]
[1] "me@mything.com "   "you@yourspace.com"

只有两行才能让你基本上到达那里。是的，你需要过滤掉空字符串，并修剪一些空白区域，但我觉得这样更清洁。

修改函数以删除R函数中的全局变量？

1 个答案: