我似乎无法从以下短语中获取电子邮件地址:
“的mailto:?fwwrp-3492801490@yahoo.com”
到目前为止,我已经尝试了
regexpr(":([^\\?*]?)", phrase)
代码的逻辑如下:
我不确定我的正则表达式在哪里出错。
答案 0 :(得分:9)
让我们看看你的正则表达式,我们会看到你出错的地方。我们将它拆开以便更容易讨论:
: Just a literal colon, no worries here.
( Open a capture group.
[ Open a character class, this will match one character.
^ The leading ^ means "negate this class"
\\ This ends up as a single \ when the regex engine sees it and that will
escape the next character.
? This has no special meaning inside a character class, sometimes a
question mark is just a question mark and this is one of those
times. Escaping a simple character doesn't do anything interesting.
* Again, we're in a character class so * has no special meaning.
] Close the character class.
? Zero or one of the preceding pattern.
) Close the capture group.
消除噪音给我们:([^?*]?)
。
所以你的正则表达式实际匹配:
冒号后跟零个或一个不是问号或星号的字符,非问号或非星号将出现在第一个捕获组中。
这与你想要做的完全不同。一些调整应该排除你:
:([^?]*)
匹配:
冒号后跟任意数量的非问号,非问号将出现在第一个捕获组中。
字符类外的*
是特殊的,在字符类之外它意味着“零或更多”,在字符类中它只是*
。
我会把它留给其他人来帮助你处理R方面的事情,我只是想让你了解正则表达式发生了什么。
答案 1 :(得分:3)
这是gsub
的一种非常简单的方法:
gsub("([a-z]+:)(.*)([?]$)", "\\2", "mailto:fwwrp-3492801490@yahoo.com?")
## Or, if you expect things other than characters before the colon
gsub("(.*:)(.*)([?]$)", "\\2", "mailto:fwwrp-3492801490@yahoo.com?")
## Or, discarding the first and third groups since they aren't very useful
gsub(".*:(.*)[?]$", "\\1", "mailto:fwwrp-3492801490@yahoo.com?")
建立@TylerRinker启动的位置,您还可以使用strsplit
,如下所示(以避免问题gsub
}:
strsplit("mailto:fwwrp-3492801490@yahoo.com?", ":|\\?", fixed=FALSE)[[1]][2]
如果你有这样的字符串列表怎么样?
phrase <- c("mailto:fwwrp-3492801490@yahoo.com?",
"mailto:somefunk.y-address@Sqmpalm.net?")
phrase
# [1] "mailto:fwwrp-3492801490@yahoo.com?"
# [2] "mailto:somefunk.y-address@Sqmpalm.net?"
## Using gsub
gsub("(.*:)(.*)([?]$)", "\\2", phrase)
# [1] "fwwrp-3492801490@yahoo.com" "somefunk.y-address@Sqmpalm.net"
## Using strsplit
sapply(phrase,
function(x) strsplit(x, ":|\\?", fixed=FALSE)[[1]][2],
USE.NAMES=FALSE)
# [1] "fwwrp-3492801490@yahoo.com" "somefunk.y-address@Sqmpalm.net"
我更喜欢gsub
方法的简洁。