如何使用PARSE方言从CSV读取一行?

时间:2012-11-19 09:36:49

标签: parsing csv rebol

我正在尝试使用PARSE将CSV行转换为Rebol块。很容易用开放代码编写,但与其他问题一样,我试图了解方言在没有它的情况下可以做什么。

所以,如果一行说:

"Look, that's ""MR. Fork"" to you!",Hostile Fork,,http://hostilefork.com

然后我想要块:

[{Look, that's "MR. Fork" to you!} {Hostile Fork} none {http://hostilefork.com}]

要注意的问题:

  • CSV字符串中的嵌入式引号用""
  • 表示
  • 逗号可以在引号内,因此可以是文字的一部分,而不是列分隔符
  • 相邻的列分隔逗号表示空字段
  • 不带引号或逗号的字符串可以不带引号显示
  • 目前我们可以像http://rebol.com一样保留STRING!而不是LOAD将它们转换为URL!
  • 等类型

为了使它更加统一,我要做的第一件事就是在输入行附加一个逗号。然后我有一个column-rule,它捕获一个以逗号结尾的单个列...可以是引号也可以不是。

我知道由于标题行应该有多少列,所以代码说:

unless parse line compose [(column-count) column-rule] [
    print rejoin [{Expected } column-count { columns.}]
]

但是我有点坚持写column-rule。我需要一种方言来表达“一旦你找到一个引用,继续跳过报价对,直到你找到一个独立的报价。”这样做的好方法是什么?

3 个答案:

答案 0 :(得分:3)

与大多数解析问题一样,我尝试构建一个最能描述输入格式元素的语法。

在这种情况下,我们有名词:

[comma ending value-chars qmark quoted-chars value header row]

一些动词:

[row-feed emit-value]

操作名词:

[current chunk current-row width]

我想我可能会把它分解一点,但足以与之合作。首先,基础:

comma: ","
ending: "^/"
qmark: {"}
value-chars: complement charset reduce [qmark comma ending]
quoted-chars: complement charset reduce [qmark]

现在的价值结构。引用的值是根据我们发现的有效字符或引号的块构建的:

current: chunk: none
quoted-value: [
    qmark (current: copy "")
    any [
        copy chunk some quoted-chars (append current chunk)
        |
        qmark qmark (append current qmark)
    ]
    qmark
]

value: [
    copy current some value-chars
    | quoted-value
]

emit-value: [
    (
        delimiter: comma
        append current-row current
    )
]

emit-none: [
    (
        delimiter: comma
        append current-row none
    )
]

请注意,delimiter在每行的开头设置为ending,然后在我们传递值后立即更改为comma。因此,输入行定义为[ending value any [comma value]]

剩下的就是定义文档结构:

current-row: none
row-feed: [
    (
        delimiter: ending
        append/only out current-row: copy []
    )
]

width: none
header: [
    (out: copy [])
    row-feed any [
        value comma
        emit-value
    ]
    value body: ending :body
    emit-value
    (width: length? current-row)
]

row: [
    row-feed width [
        delimiter [
            value emit-value
            | emit-none
        ]
    ]
]

if parse/all stream [header some row opt ending][out]

用它包起来保护所有这些词语,你有:

REBOL [
    Title: "CSV Parser"
    Date: 19-Nov-2012
    Author: "Christopher Ross-Gill"
]

parse-csv: use [
    comma ending delimiter value-chars qmark quoted-chars
    value quoted-value header row
    row-feed emit-value emit-none
    out current current-row width
][
    comma: ","
    ending: "^/"
    qmark: {"}
    value-chars: complement charset reduce [qmark comma ending]
    quoted-chars: complement charset reduce [qmark]

    current: none
    quoted-value: use [chunk][
        [
            qmark (current: copy "")
            any [
                copy chunk some quoted-chars (append current chunk)
                |
                qmark qmark (append current qmark)
            ]
            qmark
        ]
    ]

    value: [
        copy current some value-chars
        | quoted-value
    ]

    current-row: none
    row-feed: [
        (
            delimiter: ending
            append/only out current-row: copy []
        )
    ]
    emit-value: [
        (
            delimiter: comma
            append current-row current
        )
    ]
    emit-none: [
        (
            delimiter: comma
            append current-row none
        )
    ]

    width: none
    header: [
        (out: copy [])
        row-feed any [
            value comma
            emit-value
        ]
        value body: ending :body
        emit-value
        (width: length? current-row)
    ]

    row: [
        opt ending end break
        |
        row-feed width [
            delimiter [
                value emit-value
                | emit-none
            ]
        ]
    ]

    func [stream [string!]][
        if parse/all stream [header some row][out]
    ]
]

答案 1 :(得分:2)

几年前我不得不这样做。 我已经更新了我的funcs来处理我之后发现的所有情况。我希望它现在更加稳固。

请注意,它可以在 BUT

中处理带换行符的字符串
  1. 字符串中的换行符必须仅为LF ...
  2. 记录之间的换行符必须为CRLF ...
  3. 您必须使用read / binary加载文件,以便Rebol不会自动转换换行符。
  4. (例如,1。和2.是Excel给出的)

    ; Conversion function from CSV format
    csv-to-block: func [
        "Convert a string of CSV formated data to a Rebol block. First line is header."
        csv-data [string!] "CSV data."
        /separator separ [char!] "Separator to use if different of comma (,)."
        /without-header "Do not include header in the result."
        /local out line start end this-string header record value data chars spaces chars-but-space
        ; CSV format information http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
    ] [
        out: copy []
        separ: any [separ #","]
    
        ; This function handle replacement of dual double-quote by quote while copying substring
        this-string: func [s e] [replace/all copy/part s e {""} {"}]
        ; CSV parsing rules
        header: [(line: copy []) value any [separ value | separ (append line none)] (if not without-header [append/only out line])]
        record: [(line: copy []) value any [separ value | separ (append line none)] (append/only out line)]
        value: [any spaces data any spaces (append line this-string start end)]
        data: [start: some chars-but-space any [some spaces some chars-but-space] end: | #"^"" start: any [some chars | {""} | separ | newline] end: #"^""]
        chars: complement charset rejoin [ {"} separ newline]
        spaces: charset exclude { ^-} form separ
        chars-but-space: exclude chars spaces
    
        parse/all csv-data [header any [newline record] any newline end]
        out
    ]
    

    如果需要,我有对手block-to-csv

    [编辑]好的,对应的(注意:所有字符串!将用双引号括起来,如果您想在结果中使用标题,则标题必须位于块的第一行):

    block-to-csv: func [
        "Convert a block of blocks to a CSV formated string." 
        blk-data [block!] "block of data to convert"
        /separator separ "Separator to use if different of comma (,)."
        /local out csv-string record value v
    ] [
        out: copy ""
        separ: any [separ #","]
        ; This function convert a string to a CSV formated one
        csv-string: func [val] [head insert next copy {""} replace/all replace/all copy val {"} {""} newline #{0A} ]
        record: [into [some [value (append out separ)]]]
        value: [set v string! (append out csv-string v) | set v any-type! (append out form v)]
    
        parse/all blk-data [any [record (remove back tail out append out crlf)]]
        out
    ]
    

答案 2 :(得分:2)

此外,在BrianH的rebol.org上找到%csv-tools.r脚本。

http://www.rebol.org/view-script.r?script=csv-tools.r

很棒的代码。适用于R2和R3。