Question

如果我在某些文字中有以下模式：

def articleContent =  "<![CDATA[ Hellow World ]]>"

我想提取“Hellow World”部分，所以我使用以下代码来匹配它：

def contentRegex = "<![CDATA[ /(.)*/ ]]>"
def contentMatcher = ( articleContent =~ contentRegex )
println contentMatcher[0]

但是我一直得到一个空指针异常，因为正则表达式似乎不起作用，什么是“任何文本和平”的正确正则表达式，以及如何从字符串中收集它？

Answer 1

尝试：

def result = (articleContent =~ /<!\[CDATA\[(.+)]]>/)[ 0 ][ 1 ]

但是我担心你打算用正则表达式解析xml。如果此cdata是更大的有效xml文档的一部分，那么最好使用xml解析器

Answer 2

下面的代码显示了在groovy中使用正则表达式提取子字符串：

class StringHelper {
@NonCPS
static String stripSshPrefix(String gitUrl){
    def match = (gitUrl =~ /ssh:\/\/(.+)/)
    if (match.find()) {
        return match.group(1)
    }
    return gitUrl
  }
static void main(String... args) {
    def gitUrl = "ssh://git@github.com:jiahut/boot.git"
    def gitUrl2 = "git@github.com:jiahut/boot.git"
    println(stripSshPrefix(gitUrl))
    println(stripSshPrefix(gitUrl2))
  }
}

Answer 3

聚会晚了一点，但是在定义模式时尝试使用反斜杠，例如：

 def articleContent =  "real groovy"
 def matches = (articleContent =~ /gr\w{4}/) //grabs 'gr' and its following 4 chars
 def firstmatch = matches[0]  //firstmatch would be 'groovy'

您处在正确的轨道上，只是模式定义需要更改。

参考文献：

https://www.regular-expressions.info/groovy.html

http://mrhaki.blogspot.com/2009/09/groovy-goodness-matchers-for-regular.html

Answer 4

除了tim_yates的解决方案之外，还有一种单线解决方案

def result = articleContent.replaceAll(/<!\[CDATA\[(.+)]]>/,/$1/)

请注意，如果regexp不匹配，则结果将等于源。

def result = (articleContent =~ /<!\[CDATA\[(.+)]]>/)[0][1]

它将引发异常。

在groovy中使用正则表达式提取子字符串

4 个答案: