我正在尝试使用PARSE将CSV行转换为Rebol块。很容易用开放代码编写,但与其他问题一样,我试图了解方言在没有它的情况下可以做什么。
所以,如果一行说:
"Look, that's ""MR. Fork"" to you!",Hostile Fork,,http://hostilefork.com
然后我想要块:
[{Look, that's "MR. Fork" to you!} {Hostile Fork} none {http://hostilefork.com}]
要注意的问题:
""
http://rebol.com
一样保留STRING!而不是LOAD将它们转换为URL! 为了使它更加统一,我要做的第一件事就是在输入行附加一个逗号。然后我有一个column-rule
,它捕获一个以逗号结尾的单个列...可以是引号也可以不是。
我知道由于标题行应该有多少列,所以代码说:
unless parse line compose [(column-count) column-rule] [
print rejoin [{Expected } column-count { columns.}]
]
但是我有点坚持写column-rule
。我需要一种方言来表达“一旦你找到一个引用,继续跳过报价对,直到你找到一个独立的报价。”这样做的好方法是什么?
答案 0 :(得分:3)
与大多数解析问题一样,我尝试构建一个最能描述输入格式元素的语法。
在这种情况下,我们有名词:
[comma ending value-chars qmark quoted-chars value header row]
一些动词:
[row-feed emit-value]
操作名词:
[current chunk current-row width]
我想我可能会把它分解一点,但足以与之合作。首先,基础:
comma: ","
ending: "^/"
qmark: {"}
value-chars: complement charset reduce [qmark comma ending]
quoted-chars: complement charset reduce [qmark]
现在的价值结构。引用的值是根据我们发现的有效字符或引号的块构建的:
current: chunk: none
quoted-value: [
qmark (current: copy "")
any [
copy chunk some quoted-chars (append current chunk)
|
qmark qmark (append current qmark)
]
qmark
]
value: [
copy current some value-chars
| quoted-value
]
emit-value: [
(
delimiter: comma
append current-row current
)
]
emit-none: [
(
delimiter: comma
append current-row none
)
]
请注意,delimiter
在每行的开头设置为ending
,然后在我们传递值后立即更改为comma
。因此,输入行定义为[ending value any [comma value]]
。
剩下的就是定义文档结构:
current-row: none
row-feed: [
(
delimiter: ending
append/only out current-row: copy []
)
]
width: none
header: [
(out: copy [])
row-feed any [
value comma
emit-value
]
value body: ending :body
emit-value
(width: length? current-row)
]
row: [
row-feed width [
delimiter [
value emit-value
| emit-none
]
]
]
if parse/all stream [header some row opt ending][out]
用它包起来保护所有这些词语,你有:
REBOL [
Title: "CSV Parser"
Date: 19-Nov-2012
Author: "Christopher Ross-Gill"
]
parse-csv: use [
comma ending delimiter value-chars qmark quoted-chars
value quoted-value header row
row-feed emit-value emit-none
out current current-row width
][
comma: ","
ending: "^/"
qmark: {"}
value-chars: complement charset reduce [qmark comma ending]
quoted-chars: complement charset reduce [qmark]
current: none
quoted-value: use [chunk][
[
qmark (current: copy "")
any [
copy chunk some quoted-chars (append current chunk)
|
qmark qmark (append current qmark)
]
qmark
]
]
value: [
copy current some value-chars
| quoted-value
]
current-row: none
row-feed: [
(
delimiter: ending
append/only out current-row: copy []
)
]
emit-value: [
(
delimiter: comma
append current-row current
)
]
emit-none: [
(
delimiter: comma
append current-row none
)
]
width: none
header: [
(out: copy [])
row-feed any [
value comma
emit-value
]
value body: ending :body
emit-value
(width: length? current-row)
]
row: [
opt ending end break
|
row-feed width [
delimiter [
value emit-value
| emit-none
]
]
]
func [stream [string!]][
if parse/all stream [header some row][out]
]
]
答案 1 :(得分:2)
请注意,它可以在 BUT :
中处理带换行符的字符串(例如,1。和2.是Excel给出的)
; Conversion function from CSV format
csv-to-block: func [
"Convert a string of CSV formated data to a Rebol block. First line is header."
csv-data [string!] "CSV data."
/separator separ [char!] "Separator to use if different of comma (,)."
/without-header "Do not include header in the result."
/local out line start end this-string header record value data chars spaces chars-but-space
; CSV format information http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
] [
out: copy []
separ: any [separ #","]
; This function handle replacement of dual double-quote by quote while copying substring
this-string: func [s e] [replace/all copy/part s e {""} {"}]
; CSV parsing rules
header: [(line: copy []) value any [separ value | separ (append line none)] (if not without-header [append/only out line])]
record: [(line: copy []) value any [separ value | separ (append line none)] (append/only out line)]
value: [any spaces data any spaces (append line this-string start end)]
data: [start: some chars-but-space any [some spaces some chars-but-space] end: | #"^"" start: any [some chars | {""} | separ | newline] end: #"^""]
chars: complement charset rejoin [ {"} separ newline]
spaces: charset exclude { ^-} form separ
chars-but-space: exclude chars spaces
parse/all csv-data [header any [newline record] any newline end]
out
]
如果需要,我有对手block-to-csv
。
[编辑]好的,对应的(注意:所有字符串!将用双引号括起来,如果您想在结果中使用标题,则标题必须位于块的第一行):
block-to-csv: func [
"Convert a block of blocks to a CSV formated string."
blk-data [block!] "block of data to convert"
/separator separ "Separator to use if different of comma (,)."
/local out csv-string record value v
] [
out: copy ""
separ: any [separ #","]
; This function convert a string to a CSV formated one
csv-string: func [val] [head insert next copy {""} replace/all replace/all copy val {"} {""} newline #{0A} ]
record: [into [some [value (append out separ)]]]
value: [set v string! (append out csv-string v) | set v any-type! (append out form v)]
parse/all blk-data [any [record (remove back tail out append out crlf)]]
out
]
答案 2 :(得分:2)
此外,在BrianH的rebol.org上找到%csv-tools.r脚本。
http://www.rebol.org/view-script.r?script=csv-tools.r
很棒的代码。适用于R2和R3。