Question

我有一堆字符串，我想用以下格式解析并只提取电子邮件和字符串，后面跟着一个分隔符

email[delimiter]string

换句话说 [包含任何ascii字符的电子邮件] [delimiter] [包含任何ascii字符的字符串]

分隔符可以是;;：|或|| e.g。

abc@xyz.com,blah
abc@xyz.au;blah1
abc@xyz.ru:blah2
abc@xyz.ru|blah,2
abc@xyz.ru||blah2

到目前为止，我的进展是跟正则表达式匹配上面的字符串，但是如何修改这个正则表达式，以便我可以形成适当的组来仅提取电子邮件和字符串，后面跟着Java / Scala中的分隔符

.+@.+([:;,|])+.+$

java代码看起来像这样：

// Create a Pattern object
        Pattern r = Pattern.compile(pattern);
        Matcher m = r.matcher(line);

        if (m.find()) {
            System.out.println("Email: " + m.group(0));
            System.out.println("Value: " + m.group(1));
        } else {
            System.out.println("NO MATCH");
        }

Answer 1

您似乎已经为自己制定了正则表达式部分。我有结果提取的建议：使用kantan.regex。

这允许你写：

import kantan.regex.implicits._

// Declare your regular expression, validated at compile time.
val regex = rx"(.+@[A-Za-z0-9.]+)(?:[:;,|]+)(.*)"

// Sample input
val input = "abc@xyz.com,blah"

// Returns an Iterator[(String, String)] on all matches, where
// ._1 is the email and ._2 the string
input.evalRegex[(String, String)](regex)

请注意，您可能希望为此使用更好的类型值 - 例如，案例类而不是(String, String)。这也是可能的 - 你可以自己提供解码器，或者让它们无形地派生出来：

import kantan.regex.generic._

// Case class in which to store results.
case class MailMatch(mail: String, value: String)

// Returns an Iterator[MailMatch]
input.evalRegex[MailMatch](regex)

完全披露：我是作者。

Answer 2

所以，用我的工作回答我自己的问题。正则表达专家 - 您可以在这里找到任何漏洞吗？

Pattern COMPILE = Pattern.compile("(.+@[A-Za-z0-9.\"]+)(?:[:;,|]+)(.*)");
Matcher m = COMPILE.matcher(next);

if (m.find()) {
    System.out.println(m.group(1));
    System.out.println(m.group(2));
} else {
    System.out.println("NO MATCH");
}

编辑：根据MYGz的回答编辑使用非捕获组

Answer 3

(\\w+@\\w+)[:;,\\|](.+)$

然后使用Java从Match中提取组。第1组是电子邮件，第2组是分隔符后面的字符串。

Java / Scala提取电子邮件和格式为email [delimiter]字符串的字符串

3 个答案: