解组json流(不是换行符分隔)

时间:2015-04-30 21:47:45

标签: json go stream decode

我想将JSON流转换为对象流。使用换行符分隔的JSON很容易做到这一点。来自Go文档:https://golang.org/pkg/encoding/json/#Decoder.Buffered

但是,我需要从JSON数组生成一个流,如下所示:

        [{"Name": "Ed", "Text": "Knock knock."},
        {"Name": "Sam", "Text": "Who's there?"},
        {"Name": "Ed", "Text": "Go fmt."},
        {"Name": "Sam", "Text": "Go fmt who?"},
        {"Name": "Ed", "Text": "Go fmt yourself!"}]

执行此操作的高效方法是什么?

我考虑过这种方法:

  1. 放下外侧尖括号
  2. 当有匹配的顶级花括号时,在大括号(包括)之间解组字符串以一次获得一个顶级对象。
  3. 我不想这样做,因为扫描字符串的每个部分两次会影响性能。

    我可以做的最好的选择是在Golang编码/ json包中复制解码器的源代码并修改它,以便它返回一个一次吐出一个对象的Reader。但对于这样一个简单的要求来说,这似乎太过分了。

    有没有更好的方法来解码作为JSON数组的流?

    修改

    我希望用嵌套对象和任意结构解析JSON。

3 个答案:

答案 0 :(得分:1)

您可以使用流解析器。例如megajson's scanner

package main

import (
    "fmt"
    "strings"

    "github.com/benbjohnson/megajson/scanner"
)

func main() {
    // our incoming data
    rdr := strings.NewReader(`[
        {"Name": "Ed", "Text": "Knock knock."},
        {"Name": "Sam", "Text": "Who's there?"},
        {"Name": "Ed", "Text": "Go fmt."},
        {"Name": "Sam", "Text": "Go fmt who?"},
        {"Name": "Ed", "Text": "Go fmt yourself!"}
    ]`)

    // we want to create a list of these
    type Object struct {
        Name string
        Text string
    }
    objects := make([]Object, 0)

    // scan the JSON as we read
    s := scanner.NewScanner(rdr)

    // this is how we keep track of where we are parsing the JSON
    // if you needed to support nested objects you would need to
    // use a stack here ([]state{}) and push / pop each time you
    // see a brace
    var state struct {
        inKey   bool
        lastKey string
        object  Object
    }
    for {
        tok, data, err := s.Scan()
        if err != nil {
            break
        }

        switch tok {
        case scanner.TLBRACE:
            // just saw '{' so start a new object
            state.inKey = true
            state.lastKey = ""
            state.object = Object{}
        case scanner.TRBRACE:
            // just saw '}' so store the object
            objects = append(objects, state.object)
        case scanner.TSTRING:
            // for `key: value`, we just parsed 'key'
            if state.inKey {
                state.lastKey = string(data)
            } else {
                // now we are on `value`
                if state.lastKey == "Name" {
                    state.object.Name = string(data)
                } else {
                    state.object.Text = string(data)
                }
            }
            state.inKey = !state.inKey
        }
    }
    fmt.Println(objects)
}

这可能与您可以获得的效率相同,但它确实需要大量的手动处理。

答案 1 :(得分:0)

假设json流如下:

{"Name": "Ed", "Text": "Knock knock."}{"Name": "Sam", "Text": "Who's there?"}{"Name": "Ed", "Text": "Go fmt."}

我知道,伪代码如下:

1: skip prefix whitespace
2: if first char not {, throw error
3: load some chars, and find the first "}"
    4: if found, try json.Unmarshal()
        5: if unmarshal fail, load more chars, and find second "}"
             6: redo STEP 4

答案 2 :(得分:0)

以下是已在我的项目中使用的实现:

package json

import (
    "bytes"
    j "encoding/json"
    "errors"
    "io"
    "strings"
)

// Stream represent a json stream
type Stream struct {
    stream *bytes.Buffer
    object *bytes.Buffer
    scrap  *bytes.Buffer
}

// NewStream return a Stream that based on src
func NewStream(src []byte) *Stream {
    return &Stream{
        stream: bytes.NewBuffer(src),
        object: new(bytes.Buffer),
        scrap:  new(bytes.Buffer),
    }
}

// Read read a json object
func (s *Stream) Read() ([]byte, error) {
    var obj []byte

    for {
        // read a rune from stream
        r, _, err := s.stream.ReadRune()
        switch err {
        case nil:
        case io.EOF:
            if strings.TrimSpace(s.object.String()) != "" {
                return nil, errors.New("Invalid JSON")
            }

            fallthrough
        default:
            return nil, err
        }

        // write the rune to object buffer
        if _, err := s.object.WriteRune(r); err != nil {
            return nil, err
        }

        if r == '}' {
            obj = s.object.Bytes()

            // check whether json string valid
            err := j.Compact(s.scrap, obj)
            s.scrap.Reset()
            if err != nil {
                continue
            }

            s.object.Reset()

            break
        }
    }

    return obj, nil
}

用法如下:

func process(src []byte) error {
    s := json.NewStream(src)

    for {
        obj, err := s.Read()
        switch err {
        case nil:
        case io.EOF:
            return nil 
        default:
            return err 
        }   

        // now you can try to decode the obj to a struct/map/...
        // it is also support mix stream, ex.:
        a = new(TypeOne)
        b = new(TypeTwo)
        if err := j.Unmarshal(obj, a); err == nil && a.Error != "" {
             // it is a TypeOne object
        } else if err := j.Unmarshal(obj, b); err == nil && a.ID != "" {
             // it is a TypeTwo object
        } else {
             // unkown type
        }
    }

    return nil
}