Kotlin正则表达式找到“输入开始时”

时间:2018-09-24 13:37:57

标签: regex string kotlin compiler-construction tokenize

使用regex.find(input,pos)时,我可以让kotlin将pos视为行的开头吗?

即:

val s = "foo(2)"

/*let's say I already extracted "foo"
  and now want to extract tokens '(', '2' and ')'
*/

val r1a = "\\(".toRegex()
val r1b = "\\)".toRegex()

println(r1a.find(s,3)?.let{"found '${it.value}'"} ?: "Nothing found")
println(r1b.find(s,3)?.let{"found '${it.value}'"} ?: "Nothing found")
println()

//this finds both
//but I only want to find '(' because it's at the beginning of the remaining string

val r2a = "^\\(".toRegex()
val r2b = "^\\)".toRegex()

println(r2a.find(s,3)?.let{"found '${it.value}'"} ?: "Nothing found")
println(r2b.find(s,3)?.let{"found '${it.value}'"} ?: "Nothing found")
println()

//this finds neither.
//I want the following behaviour:

val ss = s.substring(3)
println(r2a.find(ss,0)?.let{"found '${it.value}'"} ?: "Nothing found")
println(r2b.find(ss,0)?.let{"found '${it.value}'"} ?: "Nothing found")
println()

/*which finds '(' but not ')',
  but without having to explicitly split the string
*/

ideone version

有没有办法做到这一点?

编辑

想要匹配“ foo(2)”。

我希望能够将此字符串输入匹配项列表,该列表将首先匹配foo然后匹配(然后匹配2然后匹配)

fun tokenizeLine(line:String){
    var pos = 0
    while(pos < line.length){
        val result = nextToken(line,pos)
        pos += result.consumed
        result.token?.let { tokens.add(it) }
    }
    tokens.add(Token.EOL)
}

每个匹配器返回其中一个

sealed class TokenizerResult(val consumed : Int, val token:Token?){
    class Something(consumed:Int, token:Token):TokenizerResult(consumed,token)
    class Skip(consumed:Int=0):TokenizerResult(consumed,null)
    object Nothing:TokenizerResult(0,null)
}

fun nextToken(input:String, pos:Int) : TokenizerResult遍历匹配器列表,直到耗尽匹配器以尝试 或其中一个匹配器返回的内容不是TokenizerResult.Nothing

val matchers = listOf( skipWhitespace, number, parensOpen, parensClose, identifier, ... )

for(m in matchers){
    result = m(input,pos)
    if(result != TNothing) break
}

if(result == TNothing){
    ...
}

return result

编辑2

匹配器通常是这样的:

class RawMatch(val regex:Regex) : Pattern{
    override fun match(input: String, pos: Int, createToken: (value: String) -> Token): TokenizerResult {
        return regex.find(input,pos)?.let { TSomething(it.value.length,createToken(it.value)) } ?: TNothing
    }
}

1 个答案:

答案 0 :(得分:0)

如果您想查找括号中任何内容的值,则可以与findgroupValues一起使用不同的正则表达式:

val str = "foo(2)"
val regex = "\\s*\\((\\d*)\\)".toRegex()

println(regex.find(str)?.groupValues?.last())

正则表达式会在前面查找任何字符串值,然后在括号内查找数字。数字本身按一组括号分组,可通过groupValues变量将其取出。没有字符串转义,正则表达式是这样的:

\s*\((\d*)\)