TXR:使用函数

时间:2017-02-10 18:16:21

标签: text-processing txr

我正在尝试解析一堆计算机报告的“摘要”区域,其中报告名称及其关联变量在不同文件之间发生变化。我按照下面的格式给出一个补充示例:

 Summary Report


       Bath Tub

  Temperature:    30 °C       

  Water ready                 
       volume:    200000 cm³  


    Bath Room

   Floor Area:    40 ft²      

  Door Height:    9 ± 0.1 ft  



Full Report Set

从上面很难看出白色空间是什么样的,所以这里是我的文本编辑器的屏幕截图,带有可见的空白区域。

dummy report summary file screenshot

感兴趣的区域以Summary Report开头,以Full Report Set结尾。属性可能跨越两行。对齐属性名称,使冒号:保持在每个子报告中的相同字符位置。

从诊断输出看来,我试图利用这一事实是行不通的。

  

txr:(src / generic-micrometrics-report.txr:36)chr不匹配(位置11与k)   txr:(src / generic-micrometrics-report.txr:36)变量k绑定不匹配(13对12)   txr:(src / generic-micrometrics-report.txr:36)chr不匹配(位置12与k)   txr:(src / generic-micrometrics-report.txr:36)字符串匹配,位置13-18(data / dummy-generic-report.txt:6)   txr:(src / generic-micrometrics-report.txr:36)温度:30°C
  txr:(src / generic-micrometrics-report.txr:36)^ ^   txr:(src / generic-micrometrics-report.txr:23)规范耗尽了数据   txr :(源位置不适用)功能(捕获(nil(k.13)(报告。“浴缸”)))失败

我已经包含了以下代码。你能解释为什么这段代码不起作用吗?我正在做我认为我正在使用colon_position函数的事情吗?如果是这样,为什么会失败?你会怎么写capture函数?这是您采取的一般方法吗?有没有更好的办法?非常感谢你的帮助和建议。

@; This output format always starts with or ends with atleast 2 blank spaces.
@; Fully blank spaced lines follow each property value pair line.
@(define blank_spaces)
  @/[ ]+/@(eol)
@(end)
@; All colons align at the same column position within the body of a report.
@; If that doesn't happen, that means there is nothing to capture,
@; which shouldn't happen.
@; This function should bind the appropriate position without updating
@; the line position.
@; Reports end when there is an empty line, so don't look past that.
@(define colon_position (column))
@(trailer)
@(gather :vars (column))
@(skip)@(chr column):@(skip)
@(until)

@(end)
@(end)
@; Capture values for a property. Values are always given on a single line.
@; If there is error information, it will be indicated by a ± character.#\x00B1
@(define capture (value error units))
@(cases)@value@\ ±@\ @error@\ @units@/[ ]+/@(eol)@\
@(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
@(end)
@(end)
 Summary Report

@(collect :vars (report property value error units))

 @report

@(forget k)
@(colon_position k)
@(cases)
 @property@(chr k):    @(capture value error units)@(blank_spaces)
@(ord)
@; Properties can span two lines. I have not seen any that span more.
 @property_head@(chr k)     @(blank_spaces)
 @property_tail@(chr k):    @(capture value error units)@(blank_spaces)
 @(merge property property_head property_tail)
 @(cat property " ")
@(end)
@(blank_spaces)
@(end)


Full Report Set
@(output)
report,property,value,error,units
@(repeat)
@report,@property,@value,@error,@units
@(end)
@(end)

1 个答案:

答案 0 :(得分:1)

在这里和那里做了一些改变后,我现在得到了这个输出:

report,property,value,error,units
Bath Tub,Temperature,30,,°C
Bath Tub,Water ready volume,200000,,cm³
Bath Room,Floor Area,40,,ft²
Bath Room,Door Height,9,0.1,ft

代码:

@; This output format always starts with or ends with atleast 2 blank spaces.
@; Fully blank spaced lines follow each property value pair line.
@(define blank_spaces)@\
@/[ ]*/@(eol)@\
@(end)
@; All colons align at the same column position within the body of a report.
@; If that doesn't happen, that means there is nothing to capture,
@; which shouldn't happen.
@; This function should bind the appropriate position without updating
@; the line position.
@; Reports end when there is an empty line, so don't look past that.
@(define colon_position (column))
@  (trailer)
@  (gather :vars (column))
@  (skip)@(chr column):@(skip)
@(until)

@(end)
@(end)
@; Capture values for a property. Values are always given on a single line.
@; If there is error information, it will be indicated by a ± character.#\x00B1
@(define capture (value error units))@\
  @(cases)@value@\ ±@\ @error@\ @units @(eol)@\
  @(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
  @(end)@\
@(end)
 Summary Report

@(collect :vars (report property value error units))

 @report

@  (colon_position k)
@  (collect)
@    (cases)
 @property@(chr k):    @(capture value error units)@(blank_spaces)
@    (or)
@; Properties can span two lines. I have not seen any that span more.
 @property_head@(chr k)     @(blank_spaces)
 @property_tail@(chr k):    @(capture value error units)@(blank_spaces)
@      (merge property property_head property_tail)
@      (cat property " ")
@    (end)
@  (until)


@  (end)
@(until)
Full Report Set
@(end)
@(output)
report,property,value,error,units
@  (repeat)
@    (repeat)
@report,@property,@value,@error,@units
@    (end)
@  (end)
@(end)

使用冒号的技巧实际上是有效的(trailerchr的良好应用)。代码被绊倒的地方是各种小细节。将@(or)拼写错误为@(orf),模式函数应该是水平的而不是使用正确的@\行继续,而@(blank_spaces)中的不正确导致它无条件地消耗一些空格, @(merge)之前的虚假空格等。

此外,主要问题是数据是双重嵌套的,因此我们需要在收集中收集数据。我们还需要适当的@(until)终止模式。对于内部收集,我选择了两个空白行;这似乎是终止部分的东西(它适用于数据样本)。外部收集在Full Report Set上终止,但这不是绝对必要的。

要使用嵌套集合,我们在输出中使用嵌套重复。

我申请了一些缩进。水平函数可以使用空格缩进,因为忽略行继续之后的前导空格。

@(forget k)消失了;那里的范围没有k。周围收集的每次迭代都会在没有k的环境中重新绑定k

附录:这是对代码的差异,使其对意外数据更加健壮。实际上,内部@(collect)将无声地跳过非匹配元素,这意味着如果文件包含不符合预期情况的元素,它们将被忽略。此行为已被利用:这就是为什么忽略数据项之间的空行。我们可以用:gap 0来收紧(收集的区域必须是连续的)并处理空白行作为一个案例。然后,回退案例可以将输入行诊断为无法识别:

diff --git a/extract.txr b/extract.txr
index 8c93d89..3d1fac6 100644
--- a/extract.txr
+++ b/extract.txr
@@ -24,6 +24,7 @@
   @(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
   @(end)@\
 @(end)
+@(name file)
  Summary Report

 @(collect :vars (report property value error units))
@@ -31,7 +32,7 @@
  @report

 @  (colon_position k)
-@  (collect)
+@  (collect :gap 0)
 @    (cases)
  @property@(chr k):    @(capture value error units)@(blank_spaces)
 @    (or)
@@ -40,6 +41,12 @@
  @property_tail@(chr k):    @(capture value error units)@(blank_spaces)
 @      (merge property property_head property_tail)
 @      (cat property " ")
+@    (or)
+
+@    (or)
+@      (line ln)
+@      badline
+@      (throw error `@file:@ln unrecognized syntax: @badline`)
 @    (end)
 @  (until)