我正在尝试解析一堆计算机报告的“摘要”区域,其中报告名称及其关联变量在不同文件之间发生变化。我按照下面的格式给出一个补充示例:
Summary Report
Bath Tub
Temperature: 30 °C
Water ready
volume: 200000 cm³
Bath Room
Floor Area: 40 ft²
Door Height: 9 ± 0.1 ft
Full Report Set
从上面很难看出白色空间是什么样的,所以这里是我的文本编辑器的屏幕截图,带有可见的空白区域。
感兴趣的区域以Summary Report
开头,以Full Report Set
结尾。属性可能跨越两行。对齐属性名称,使冒号:
保持在每个子报告中的相同字符位置。
从诊断输出看来,我试图利用这一事实是行不通的。
txr:(src / generic-micrometrics-report.txr:36)chr不匹配(位置11与k) txr:(src / generic-micrometrics-report.txr:36)变量k绑定不匹配(13对12) txr:(src / generic-micrometrics-report.txr:36)chr不匹配(位置12与k) txr:(src / generic-micrometrics-report.txr:36)字符串匹配,位置13-18(data / dummy-generic-report.txt:6) txr:(src / generic-micrometrics-report.txr:36)温度:30°C
txr:(src / generic-micrometrics-report.txr:36)^ ^ txr:(src / generic-micrometrics-report.txr:23)规范耗尽了数据 txr :(源位置不适用)功能(捕获(nil(k.13)(报告。“浴缸”)))失败
我已经包含了以下代码。你能解释为什么这段代码不起作用吗?我正在做我认为我正在使用colon_position函数的事情吗?如果是这样,为什么会失败?你会怎么写capture
函数?这是您采取的一般方法吗?有没有更好的办法?非常感谢你的帮助和建议。
@; This output format always starts with or ends with atleast 2 blank spaces.
@; Fully blank spaced lines follow each property value pair line.
@(define blank_spaces)
@/[ ]+/@(eol)
@(end)
@; All colons align at the same column position within the body of a report.
@; If that doesn't happen, that means there is nothing to capture,
@; which shouldn't happen.
@; This function should bind the appropriate position without updating
@; the line position.
@; Reports end when there is an empty line, so don't look past that.
@(define colon_position (column))
@(trailer)
@(gather :vars (column))
@(skip)@(chr column):@(skip)
@(until)
@(end)
@(end)
@; Capture values for a property. Values are always given on a single line.
@; If there is error information, it will be indicated by a ± character.#\x00B1
@(define capture (value error units))
@(cases)@value@\ ±@\ @error@\ @units@/[ ]+/@(eol)@\
@(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
@(end)
@(end)
Summary Report
@(collect :vars (report property value error units))
@report
@(forget k)
@(colon_position k)
@(cases)
@property@(chr k): @(capture value error units)@(blank_spaces)
@(ord)
@; Properties can span two lines. I have not seen any that span more.
@property_head@(chr k) @(blank_spaces)
@property_tail@(chr k): @(capture value error units)@(blank_spaces)
@(merge property property_head property_tail)
@(cat property " ")
@(end)
@(blank_spaces)
@(end)
Full Report Set
@(output)
report,property,value,error,units
@(repeat)
@report,@property,@value,@error,@units
@(end)
@(end)
答案 0 :(得分:1)
在这里和那里做了一些改变后,我现在得到了这个输出:
report,property,value,error,units
Bath Tub,Temperature,30,,°C
Bath Tub,Water ready volume,200000,,cm³
Bath Room,Floor Area,40,,ft²
Bath Room,Door Height,9,0.1,ft
代码:
@; This output format always starts with or ends with atleast 2 blank spaces.
@; Fully blank spaced lines follow each property value pair line.
@(define blank_spaces)@\
@/[ ]*/@(eol)@\
@(end)
@; All colons align at the same column position within the body of a report.
@; If that doesn't happen, that means there is nothing to capture,
@; which shouldn't happen.
@; This function should bind the appropriate position without updating
@; the line position.
@; Reports end when there is an empty line, so don't look past that.
@(define colon_position (column))
@ (trailer)
@ (gather :vars (column))
@ (skip)@(chr column):@(skip)
@(until)
@(end)
@(end)
@; Capture values for a property. Values are always given on a single line.
@; If there is error information, it will be indicated by a ± character.#\x00B1
@(define capture (value error units))@\
@(cases)@value@\ ±@\ @error@\ @units @(eol)@\
@(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
@(end)@\
@(end)
Summary Report
@(collect :vars (report property value error units))
@report
@ (colon_position k)
@ (collect)
@ (cases)
@property@(chr k): @(capture value error units)@(blank_spaces)
@ (or)
@; Properties can span two lines. I have not seen any that span more.
@property_head@(chr k) @(blank_spaces)
@property_tail@(chr k): @(capture value error units)@(blank_spaces)
@ (merge property property_head property_tail)
@ (cat property " ")
@ (end)
@ (until)
@ (end)
@(until)
Full Report Set
@(end)
@(output)
report,property,value,error,units
@ (repeat)
@ (repeat)
@report,@property,@value,@error,@units
@ (end)
@ (end)
@(end)
使用冒号的技巧实际上是有效的(trailer
和chr
的良好应用)。代码被绊倒的地方是各种小细节。将@(or)
拼写错误为@(orf)
,模式函数应该是水平的而不是使用正确的@\
行继续,而@(blank_spaces)
中的不正确导致它无条件地消耗一些空格, @(merge)
之前的虚假空格等。
此外,主要问题是数据是双重嵌套的,因此我们需要在收集中收集数据。我们还需要适当的@(until)
终止模式。对于内部收集,我选择了两个空白行;这似乎是终止部分的东西(它适用于数据样本)。外部收集在Full Report Set
上终止,但这不是绝对必要的。
要使用嵌套集合,我们在输出中使用嵌套重复。
我申请了一些缩进。水平函数可以使用空格缩进,因为忽略行继续之后的前导空格。
@(forget k)
消失了;那里的范围没有k
。周围收集的每次迭代都会在没有k
的环境中重新绑定k
。
附录:这是对代码的差异,使其对意外数据更加健壮。实际上,内部@(collect)
将无声地跳过非匹配元素,这意味着如果文件包含不符合预期情况的元素,它们将被忽略。此行为已被利用:这就是为什么忽略数据项之间的空行。我们可以用:gap 0
来收紧(收集的区域必须是连续的)并处理空白行作为一个案例。然后,回退案例可以将输入行诊断为无法识别:
diff --git a/extract.txr b/extract.txr
index 8c93d89..3d1fac6 100644
--- a/extract.txr
+++ b/extract.txr
@@ -24,6 +24,7 @@
@(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
@(end)@\
@(end)
+@(name file)
Summary Report
@(collect :vars (report property value error units))
@@ -31,7 +32,7 @@
@report
@ (colon_position k)
-@ (collect)
+@ (collect :gap 0)
@ (cases)
@property@(chr k): @(capture value error units)@(blank_spaces)
@ (or)
@@ -40,6 +41,12 @@
@property_tail@(chr k): @(capture value error units)@(blank_spaces)
@ (merge property property_head property_tail)
@ (cat property " ")
+@ (or)
+
+@ (or)
+@ (line ln)
+@ badline
+@ (throw error `@file:@ln unrecognized syntax: @badline`)
@ (end)
@ (until)