我正在使用pyparsing api来提取给定输入文本的内容,这些文本通常具有以下结构。
Private Sub cmdGetData_Click()
Dim Crit As Range
Dim FindMe As Range
Dim DataSH As Worksheet
Set DataSH = Sheet4
On Error GoTo errHandler:
Application.ScreenUpdating = False
'///////////////////////////////////////////
'if header is selected add the criteria
If Me.cboHeader.Value <> "All_Columns" Then
If Me.txtSearch = "" Then
DataSH.Range("Y9") = ""
Else
DataSH.Range("Y9") = "*" & Me.txtSearch.Value & "*"
End If
End If
'///////////////////////////////////////////
'If all columns is selected
If Me.cboHeader.Value = "All_Columns" Then
'find the value in the column
Set FindMe = DataSH.Range("B9:O100000").Find(What:=txtSearch, LookIn:=xlValues, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False)
'variable for criteria header
Set Crit = DataSH.Cells(8, FindMe.Column)
'if no criteria is added to the search
If Me.txtSearch = "" Then
DataSH.Range("Y9") = ""
DataSH.Range("Y8") = ""
Else
'add values from the search
DataSH.Range("Y8") = Crit
If Crit = "ID" Then
DataSH.Range("Y9") = Me.txtSearch.Value
Else
DataSH.Range("Y9") = "*" & Me.txtSearch.Value & "*"
End If
'show in the userform the header that is added
Me.txtAllColumn = DataSH.Range("Y8").Value
End If
End If
'/////////////////////////////////////////
'unprotect all sheets
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''Unprotect_All
'Filter the data
DataSH.Range("B8").CurrentRegion.AdvancedFilter Action:=xlFilterCopy, _
CriteriaRange:=Range("NEW_IBO_Tracker!$Y$8:$Y$9"), CopyToRange:=Range("NEW_IBO_Tracker!$AA$8:$AO$8"), _
Unique:=False
'add the dynamic data to the listbox
lstData.RowSource = DataSH.Range("outdata").Address(external:=True)
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''Protect_All
'error handler
On Error GoTo 0
Exit Sub
errHandler:
'''''''''''''''''''''''''''''''''''''''''''''''''''''Protect all sheets
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''Protect_All
'If error occurs then show me exactly where the error occurs
MsgBox "No match found for: " & txtSearch.Text
'clear the listbox if no match is found
Me.lstData.RowSource = ""
Exit Sub
End Sub
在某些情况下,给定键的值可能很长,因此它被写入多行。
Key1 : Value1 \n
Key2 : Value2 \n
. : . \n
. : . \n
. : . \n
Keyn : . \n
当我有一些长值的键时,如上面的例子我总是只有第一行的内容。
这是为我的pyparsing定义的BNF:
Key_k : Value_k value_k value_k
value_k value_k value_k
当键和值写在同一行时,我得到了很好的结果。
欢迎任何帮助,并提前感谢您。
答案 0 :(得分:1)
随着Paul提出意见,我知道我的答案肯定不是最理想的。但是,我喜欢尝试通过pyparsing解决问题的问题。
这样做的一种方法是说一个“价值”&#39;是一个没有白色空间和冒号的东西,同时还有一个关键词。是。因此,我定义了一个名为key_ending
的语法元素。
我希望将键和值分组到结果中;因此,我使用Group
。
>>> import pyparsing as pp
>>> key_name = pp.Word(pp.alphanums+'_')
>>> key_ending = pp.ZeroOrMore(' ') + ':'
>>> key = key_name + key_ending
>>> value = pp.Word(pp.alphanums) + pp.NotAny(key_ending)
>>> values = pp.OneOrMore(value)
>>> param = pp.Group(key + values)
>>> param_stream = pp.OneOrMore(param)
>>> lines = '''\
... key1 : value1
... key2 : value1 value2
... key3: value1
... key4 : value2
... value3 value4
... '''
此行表明解析已成功。
>>> param_stream.parseString(lines)
([(['key1', ':', 'value1'], {}), (['key2', ':', 'value1', 'value2'], {}), (['key3', ':', 'value1'], {}), (['key4', ':', 'value2', 'value3', 'value4'], {})], {})
我再次进行解析,这次捕获它,以便我可以显示它的各个位。
>>> r = param_stream.parseString(lines)
>>> for param in r.asList():
... param[0], param[2:]
...
('key1', ['value1'])
('key2', ['value1', 'value2'])
('key3', ['value1'])
('key4', ['value2', 'value3', 'value4'])
在这样做之后,我发现我可以使用pyparsing的FollowedBy,并且有内置的用于查找键值对。
答案 1 :(得分:1)
当我尝试使用以下示例时,我有任何结果:
From: Blabla bhlkf <Blabla.bhlkf@atghg.tele2.com>
Sent: 2014-01-22 14:21:31
To: "support@atghg.com" <support@atghg.com>
Subject: Blablablabla bla bla Subject
Case request: STL Cas Hours (JKLM, KJH 1 24x7 EPLi
Loc)
Request Type: Support (HTL)
Product: TGHLKI HS+ / BLOS V. 9.9.x.x
Product Instance State: In Service
DSLAM address: HGJLKM
Problem Description: All usershoud be in nkowns
that the first line should be also extracted
Ticket Priority: 3 = Very import request
Contact Name: Blabla
Contact phone: +0187 87652 99883
Alternate phone: +012 7890 877343 1 9009 35
Tele2UTA Ticket ID: HGFDL5666
Service Agreement: 7543864
Contact Company: FAX2
xlfswott01> users -l | grep 001.14.06
616804042 001.14.060/0001 001:14:060/ 2044K/ 252K ATM 0.0.0.0
1:14:60-0.100-T-066048 0:07:31 0:00:00
616804043 001.14.060/0001 001:14:060/ 2044K/ 252K ATM 0.0.0.0
1:14:60-8.35-T-066048 0:07:32 0:00:00
616804044 001.14.060/0001 001:14:060/ 2044K/ 252K ATM 0.0.0.0
1:14:60-8.40-T-066048 0:07:32 0:00:00
616804054 001.14.064/0001 001:14:064/ 2044K/ 252K ATM 0.0.0.0
1:14:64-0.100-T-066050 0:07:20 0:00:00
616804055 001.14.064/0001 001:14:064/ 2044K/ 252K ATM 0.0.0.0
1:14:64-8.35-T-066050 0:07:20 0:00:00
616804056 001.14.064/0001 001:14:064/ 2044K/ 252K ATM 0.0.0.0
1:14:64-8.40-T-066050 0:07:21 0:00:00
616804057 001.14.065/0001 001:14:065/ 2044K/ 252K ATM 0.0.0.0
1:14:65-0.100-T-067398 0:07:22 0:00:00
616804058 001.14.065/0001 001:14:065/ 2044K/ 252K ATM 0.0.0.0
1:14:65-8.35-T-067398 0:07:25 0:00:00
616804059 001.14.065/0001 001:14:065/ 2044K/ 252K ATM 0.0.0.0
1:14:65-8.40-T-067398 0:07:26 0:00:00
<end user list> 3053 active user(s)
<grep> Found 9 line(s) matching search criteria
xlfswott01> users -l | grep 001.14.05
616804031 001.14.054/0001 001:14:054/ 6997K/ 903K ATM 0.0.0.0
1:14:54-0.100-T-004048 0:08:14 0:00:00
616804032 001.14.054/0001 001:14:054/ 6997K/ 903K ATM 0.0.0.0
1:14:54-8.35-T-004048 0:08:15 0:00:00
616804033 001.14.054/0001 001:14:054/ 6997K/ 903K ATM 0.0.0.0
1:14:54-8.40-T-004048 0:08:16 0:00:00616804034 001.14.055/0001
001:14:055/ 7997K/ 903K ATM 0.0.0.0 1:14:55-0.100-T-065997 0:08:17
0:00:00
616804035 001.14.055/0001 001:14:055/ 7997K/ 903K ATM 0.0.0.0
1:14:55-8.35-T-065997 0:08:17 0:00:00
616804036 001.14.055/0001 001:14:055/ 7997K/ 903K ATM 0.0.0.0
1:14:55-8.40-T-065997 0:08:20 0:00:00
616804037 001.14.057/0001 001:14:057/ 2044K/ 252K ATM 0.0.0.0
1:14:57-0.100-T-071069 0:08:20 0:00:00
616804038 001.14.057/0001 001:14:057/ 2044K/ 252K ATM 0.0.0.0
1:14:57-8.35-T-071069 0:08:22 0:00:00
616804039 001.14.057/0001 001:14:057/ 2044K/ 252K ATM 0.0.0.0
1:14:57-8.40-T-071069 0:08:23 0:00:00
616804040 001.14.059/0001 001:14:059/ 2044K/ 252K ATM 0.0.0.0
1:14:59-0.100-T-155435 0:08:23 0:00:00
616804041 001.14.059/0001 001:14:059/ 2044K/ 252K ATM 0.0.0.0
1:14:59-8.40-T-155435 0:08:24 0:00:00
616804048 001.14.050/0001 001:14:050/ 2044K/ 252K ATM 0.0.0.0
1:14:50-0.100-T-064163 0:08:09 0:00:00
616804049 001.14.050/0001 001:14:050/ 2044K/ 252K ATM 0.0.0.0
1:14:50-8.35-T-064163 0:08:08 0:00:00
616804050 001.14.050/0001 001:14:050/ 2044K/ 252K ATM 0.0.0.0
1:14:50-8.40-T-064163 0:08:10 0:00:00
616804051 001.14.051/0001 001:14:051/ 13M/1047K ATM 0.0.0.0
1:14:51-0.100-T-080123 0:08:10 0:00:00
616804052 001.14.051/0001 001:14:051/ 13M/1047K ATM 0.0.0.0
1:14:51-8.35-T-080123 0:08:10 0:00:00
616804053 001.14.051/0001 001:14:051/ 13M/1047K ATM 0.0.0.0
1:14:51-8.40-T-080123 0:08:13 0:00:00
<end user list> 3050 active user(s)
mit HFDSKJKJR LIKLSS
BLAB HGFDO
COMPANY Telecom
DESEARCH DEVELOPEMENT Network Operation Center (NOC)
Donau-City-Strasse 11, 1220 Wien
service@atgljkfyh.com
******** WICHTIGER HINWEIS ********
balblablablbalbalnbabTele2bmlablablalablaba.
blablablablaba.
******** IMPORTANT NOTICE ********
blablablbalbablabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb.
lablablbalablblb.
值得一提的是,我有兴趣只提取与我在第一篇文章中描述的结构相匹配的信息。意思是具有以下形式的数据:
Key : value
key_n: value_n1...
valume_mn