如何在Java中的匹配中获取名为capture group的正则表达式的名称?

时间:2016-09-28 18:01:00

标签: java regex scala

假设:

String text = "FACEBOOK is buying GOOGLE and FACE BOOK";

Pattern pattern = Pattern.compile("(?<FB>(FACE(\\p{Space}?)BOOK))|(?<GOOGL>(GOOGL(E)?))");
Matcher matcher = pattern.matcher(text);

我想得到这样的东西:

Group=FB matches substring="FACEBOOK" at position=[0, 8)
Group=GOOGL matches substring="GOOGLE" at position=[19, 25)
Group=FB matches substring="FACE BOOK" at position=[30, 39)

但是,我无法获得组名。这是我在Scala的尝试:

import java.util.regex.Pattern
  val pattern = Pattern.compile("(?<FB>(FACE(\\p{Space}?)BOOK))|(?<GOOGL>(GOOGL(E)?))")
  val text = "FACEBOOK is buying GOOGLE and FACE BOOK"
  val matcher = pattern.matcher(text)

  while(matcher.find()) {
    println(s"Group=???? matches substring=${matcher.group()} at position=[${matcher.start},${matcher.end})")
  }

编辑: 有人将此标记为Get group names in java regex的副本,但这是一个不同的问题。这是在给出一个MATCH,如何找到组名。另一个问题是询问如何在给定Pattern对象的情况下将group-name设置为String(或index)。

2 个答案:

答案 0 :(得分:1)

这是我在Scala中的尝试:

import java.util.regex.{MatchResult, Pattern}

class GroupNamedRegex(pattern: Pattern, namedGroups: Set[String]) {
  def this(regex: String) = this(Pattern.compile(regex), 
    "\\(\\?<([a-zA-Z][a-zA-Z0-9]*)>".r.findAllMatchIn(regex).map(_.group(1)).toSet)

  def findNamedMatches(s: String): Iterator[GroupNamedRegex.Match] = new Iterator[GroupNamedRegex.Match] {
    private[this] val m = pattern.matcher(s)
    private[this] var _hasNext = m.find()

    override def hasNext = _hasNext

    override def next() = {
      val ans = GroupNamedRegex.Match(m.toMatchResult, namedGroups.find(group => m.group(group) != null))
      _hasNext = m.find()
      ans
    }
  }
}

object GroupNamedRegex extends App {
  case class Match(result: MatchResult, groupName: Option[String])

  val r = new GroupNamedRegex("(?<FB>(FACE(\\p{Space}?)BOOK))|(?<GOOGL>(GOOGL(E)?))")
  println(r.findNamedMatches("FACEBOOK is buying GOOGLE and FACE BOOK FB").map(s => s.groupName -> s.result.group()).toList)
}

答案 1 :(得分:1)

您可以使用named-regexp Java库。它是java.util.regex的一个薄包装,主要支持Java-7之前的用户使用命名捕获组,但是它还包含检查组名的方法(即使Java 11似乎也没有): / p>