使用golang regexp获取xlsx单元格数据?

时间:2015-01-19 09:01:12

标签: regex go xlsx

我使用regexp表达式从.xlsx文件中获取数据。但是我很穷,而且在regexp中更新。有人可以帮帮我吗?

package main

import (
        "fmt"
        "regexp"
)

func main() {
        input := `
        <sheetData>
        <row r="2" spans="1:15">
        <c r="A2" s="5" ><v>{{range .txt}}</v></c>
        <c r="B2" s="5" t="s"><v>1</v></c>
        <c r="C2" s="5" t="s"><v>2</v></c>
        <c r="D2" s="5" t="s"><v>3</v></c>
        <c r="E2" s="5" />
        <c r="K2" s="6" t="s"><v>21</v></c>
    </row> 
    <row r="3" spans="1:15">
        <c r="A3" s="5" t="s"><v>0</v></c>
        <c r="B3" s="5" t="s"><v>1</v></c>
        <c r="C3" s="5" t="s"><v>2</v></c>
        <c r="D3" s="5" t="s"><v>3</v></c>
        <c r="E3" s="5" />
        <c r="K3" s="6" t="s"><v>21</v></c>
    </row> 
    </sheetData>`
        r := regexp.MustCompile(`<row[^>]*?r="(\d+)"[^>].*?>.*?[(<v>(.*?)<\/v>.*?)]<\/row>`)
        r2 := regexp.MustCompile(`<v>(.*?)</v>`)
        row:=r.FindAllString(input,-1)
        for _,v:=range row {
        fmt.Println(r.ReplaceAllStringFunc(v, func(m string) string {
               match:=r2.FindAllString(v,-1)
            for kk,vv:=range match {
            fmt.Println(kk,vv)
             fmt.Println(r2.ReplaceAllString(v, ""))             
        }  
      }))
        }
    }   

问题:

  1. 如何获取字符串{{range .txt}},并抛弃标记“...”

  2. 如何从r="3"获取“3”,并从“

  3. 获取”A3,B3,C3 ...“

    提前致谢!

1 个答案:

答案 0 :(得分:3)

我认为regexp是这项工作的错误工具。试试xml:

import "encoding/xml"

// Could probably pick better names for these.
type C struct {
    XMLName xml.Name `xml:"c"`
    V       string   `xml:"v"`
    R       string   `xml:"r,attr"`
}
type Row struct {
    XMLName xml.Name `xml:"row"`
    C       []C      `xml:"c"`
}
type Result struct {
    XMLName xml.Name `xml:"sheetData"`
    Row     []Row    `xml:"row"`
}
v := Result{}

err := xml.Unmarshal([]byte(input), &v)
if err != nil {
    fmt.Printf("error: %v", err)
    return
}
for _, r := range v.Row {
    for _, c := range r.C {
        fmt.Printf("%v %v\n", c.V, c.R)
    }
}

这将打印:

{{range .txt}} A2
1 B2
2 C2
3 D2
...