我的专栏中有以下模式
xyz@gmail.com
abc@hotmail.com
现在,我想在@
之后和.
之前提取文本,即gmail和hotmail。我能够使用以下代码在.
之后提取文本。
sub(".*@", "", email)
如何修改上述内容以适应我的用例?
答案 0 :(得分:6)
您:
@
可以出现在多个地方)someone@department.example.com
”,“someone.else@yet.another.department.example.com
”(即天真地假设只有一个域可能会在此分析的某个时刻再次咬你)所以 - 除非你确定你有并且总是会有简单的电子邮件地址 - 我可以建议:
library(stringi)
library(urltools)
library(dplyr)
library(purrr)
emails <- c("yz@gmail.com", "abc@hotmail.com",
"someone@department.example.com",
"someone.else@yet.another.department.com",
"some.brit@froodyorg.co.uk")
stri_locate_last_fixed(emails, "@")[,"end"] %>%
map2_df(emails, function(x, y) {
substr(y, x+1, nchar(y)) %>%
suffix_extract()
})
## host subdomain domain suffix
## 1 gmail.com <NA> gmail com
## 2 hotmail.com <NA> hotmail com
## 3 deparment.example.com department example com
## 4 yet.another.department.com yet.another department com
## 5 froodyco.co.uk <NA> froodyorg co.uk
请注意子域,域和域的正确拆分后缀,特别是最后一个。
知道了这一点,我们就可以将代码更改为:
stri_locate_last_fixed(emails, "@")[,"end"] %>%
map2_chr(emails, function(x, y) {
substr(y, x+1, nchar(y)) %>%
suffix_extract() %>%
mutate(full_domain=ifelse(is.na(subdomain), domain, sprintf("%s.%s", subdomain, domain))) %>%
select(full_domain) %>%
flatten_chr()
})
## [1] "gmail" "hotmail"
## [3] "department.example" "yet.another.department"
## [5] "froodyorg"
答案 1 :(得分:2)
我们可以使用gsub
gsub(".*@|\\..*", "", email)
#[1] "gmail" "hotmail"
答案 2 :(得分:2)
您可以使用:
emails <- c("xyz@gmail.com", "abc@hotmail.com")
emails_new <- gsub("@(.+)$", "\\1", emails)
emails_new
# [1] "gmail.com" "hotmail.com"
答案 3 :(得分:1)
这是@hrbrmstr 的 stringr 函数:
stringr::str_locate_all(email,"@") %>% purrr::map_int(~ .[2]) %>%
purrr::map2_df(email, ~ {
stringr::str_sub(.y, .x+1, nchar(.y)) %>%
urltools::suffix_extract()
})