应用错误收集

从r中的字符串中提取包含“ @”的单词

时间：2019-04-24 17:47:11

标签： r

我的数据如下：

emails
my email id: xxx.x@gmail.com
email to: bb_b@yahool.com
mailto: hj-hk@grk.co
you can send email to ghhd@test.co
gggh@gh.tom

我只想提取包含“ @”的单词，如下所示：

email
xxxx@gmail.com
bbb@yahool.com
hjhk@grk.co
ghhd@test.co
gggh@gh.tom

到目前为止，我一直使用

substring(data[1,1], 14)

但是很显然，当数据大小高达900k时，这是最糟糕的事情。任何帮助将不胜感激。 TIA。

1 个答案:

答案 0 :(得分：1)

您可以使用regexpr。

regmatches(d$emails, regexpr("(\\S*\\@\\S+\\.\\S*)", d$emails))
# [1] "xxxx@gmail.com" "bbb@yahool.com" "hjhk@grk.co"    "ghhd@test.co"  
# [5] "gggh@gh.tom"

数据

d <- structure(list(emails = c("my email id: xxxx@gmail.com", "email to: bbb@yahool.com", 
"mailto: hjhk@grk.co", "you can send email to ghhd@test.co", 
"gggh@gh.tom")), row.names = c(NA, -5L), class = "data.frame")