去正则表达式 - 错误解析regexp:无效的转义序列:`\ K`

时间:2014-07-19 04:53:15

标签: regex go

我正在尝试编译正则表达式,以便我可以使用Go从字符串中的数字之间提取/不包含空格的8位数字。由于某些原因,编译失败了。我该怎么回事?

validAcc, err := regexp.Compile(`[ ]\K(?<!\d )(?=(?: ?\d){8})(?!(?: ?\d){9})\d[ \d]+\d`)
if err != nil {
    return
}

Play it here

包含样本数据的更多代码

package main

import "strings"
import "regexp"
import "fmt"

func main() {

    msg := ` 12 34 56 78 //the number we need
 12 3455678 90123455 // the number we don't need`

    acc, err := accFromText(msg)
    if err != nil {
        panic(err)
    }
    exAcc := "12345678"
    if acc != exAcc {
        fmt.Printf("expected %v, received %v", exAcc, acc)
    }

    msg = `
More details here
1234567 12345 123456789 asd
12000000000 a number we don't need 
 12 3456 78 //this is the kind of number we need
 12 3455678 90123455 // the number we don't need`

    acc, err = accFromText(msg)
    if err != nil {
        panic(err)
    }
    exAcc = "12345678"
    if acc != exAcc {
        fmt.Printf("expected %v, received %v", exAcc, acc)
    }

}

func accFromText(msg string) (accNumber string, err error) {
    validAcc, err := regexp.Compile(`[ ]\K(?<!\d )(?=(?: ?\d){8})(?!(?: ?\d){9})\d[ \d]+\d`)
    if err != nil {
        return
    }
    accNumber = string(validAcc.Find([]byte(msg)))
    accNumber = strings.Replace(accNumber, " ", "", -1)
    return
}

3 个答案:

答案 0 :(得分:3)

考虑到go regexp r2并不支持任何lookbehind / ahead,你可以先尝试一个更简单的表达式:

c, err := regexp.Compile(`\b\d{8}\b`)

在您的情况下(playground),这可行

(\d\d ){4}
validAcc, err := regexp.Compile(`(\d\d ){4}`)

或者:

(\d\d ?){4} # matches '33 1133 06 Oth'
validAcc, err := regexp.Compile(`(\d\d ?){4}`)

同样,在尝试更复杂的选项之前,我首先尝试一个简单的正则表达式:它将取决于您必须解析的数据。


对于更复杂的情况,单独使用正则表达式可以帮助您捕获组中的数据,然后您需要提取找到的数字(这意味着您需要将后处理添加到正则表达式中) ):

validAcc, err := regexp.Compile(`[^\d]((\d\d ?){4})[^\d]`)
if err != nil {
    return
}
accNumber = string(validAcc.Find([]byte(msg)))[1:]
accNumber = accNumber[:len(accNumber)-1]
accNumber = strings.Replace(accNumber, " ", "", -1)

请参阅playground

答案 1 :(得分:1)

这将完成工作(更快:没有任何regexp需要)

    package main

    import "fmt"
    import "unicode"
    import "strings"

    func main() {

        msg := ` 12 34 56 78 //the number we need
     12 3455678 90123455 // the number we don't need`

        acc, err := accFromText(msg)
        if err != nil {
            panic(err)
        }
        exAcc := "12345678"
        if acc != exAcc {
            fmt.Printf("expected %v, received %v", exAcc, acc)
        }

        msg = `
    More details here
    1234567 12345 123456789 asd
    12000000000 a number we don't need 
     12 3456 78 //this is the kind of number we need
     12 3455678 90123455 // the number we don't need`

        acc, err = accFromText(msg)
        if err != nil {
            panic(err)
        }
        exAcc = "12345678"
        if acc != exAcc {
            fmt.Printf("expected %v, received %v", exAcc, acc)
        }

    }

    func accFromText(msg string) (accNumber string, err error) {
        // split msg into lines
        lines := strings.FieldsFunc(msg, func(c rune) bool {
            return unicode.IsControl(c)
        })

        // filter numbers
        fn := func(ln string) (num string) {
            for _, c := range []rune(ln) {
                if unicode.IsNumber(c) {
                    num += string(c)
                    // fmt.Println(num)
                } else if !unicode.IsSpace(c) {
                    return num
                }
            }
            return num
        }

        for _, line := range lines {
            num := fn(line)
            if len(num) == 8 {  // 8 numbers in line is the kriterium to accept
                return num, nil
            }
        }
        return "eee", nil  // Note: Change this later; it's only needed to satisfy func calls above
    }

http://play.golang.org/p/yVDgDWO9hE

答案 2 :(得分:0)

我建议你采取两个步骤:

1)使用正则表达式查找所有匹配项:\d[\d ]+\d

2)过滤掉包含8位数的

(我不认为你可以通过golang中的单个正则表达式来实现这一点)