pyparsing - 解析简单的线条

时间:2011-02-28 17:07:08

标签: pyparsing


我正在摸索如何彻底解析这条线, 我遇到了'(4801)'部分的问题,所有其他元素都被抓住了。

# MAIN_PROG     ( 4801) Generated at 2010-01-25 06:55:00

这是我到目前为止所拥有的

from pyparsing import nums, Word, Optional, Suppress, OneOrMore, Group, Combine, ParseException

unparsed_log_data = "# MAIN_PROG ( 4801) Generated at 2010-01-25 06:55:00.007    Type:  Periodic"

binary_name = "# MAIN_PROG"
pid = Literal("(" + nums + ")")
report_id = Combine(Suppress(binary_name) + pid)

year = Word(nums, max=4)
month = Word(nums, max=2)
day = Word(nums, max=2)
yearly_day = Combine(year + "-" + month + "-" + day)

clock24h = Combine(Word(nums, max=2) + ":" + Word(nums, max=2) + ":" + Word(nums, max=2) + Suppress("."))
timestamp = Combine(yearly_day + White(' ') + clock24h).setResultsName("timestamp")

time_bnf = report_id + Suppress("Generated at") + timestamp

time_bnf.searchString(unparsed_log_data)

修改 保罗,如果你有耐心, 我该如何过滤

unparsed_log_data = 
"""  
# MAIN_PROG     ( 4801) Generated at 2010-01-25 06:55:00
bla bla bla   
multi line garbage  
bla bla  
Efficiency       |       38       38 100 |   3497061    3497081  99 |  
more garbage
"""

time_bnf = report_id + Suppress("Generated at") + timestamp  
partial_report_ignore = Suppress(SkipTo("Efficiency"))  

efficiency_bnf = Suppress("|") + integer.setResultsName("Efficiency") + Suppress(integer) + integer.setResultsName("EfficiencyPercent")

这两个 efficiency_bnf.searchString(unparsed_log_data)和 report_and_effic.searchString(unparsed_log_data) 按预期返回数据, 但如果我试试

report_and_effic = report_bnf + partial_report_ignore + efficiency_bnf

report_and_effic.searchString(unparsed_log_data) return([],{})

EDIT2: 一个人应该阅读代码,
partial_report_ignore =抑制(SkipTo(“效率”,包含=真))

1 个答案:

答案 0 :(得分:2)

pid = Literal("(" + nums + ")")

应该是

pid = "(" + Word(nums) + ")"

Pyparsing允许您使用“+”向表达式对象添加字符串,如:

expr + "some string"

将其解释为:

expr + Literal("some string")

你写了Literal("(" + nums + ")")。 nums是字符串“0123456789”,用作创建Word的一部分,如Word(nums)。所以你想要匹配的不是“左边是一个由nums组成的单词后跟右边的”,你试图匹配文字字符串“(0123456789)”。