我正在尝试使用python解析许多cobol副本。
我有这个正则表达式,我已经从cobol.py中提供的修改:
^(?P<level>\d{2})\s+(?P<name>\S+).*?
(\s+INDEXED BY\s+(?P<indexed_by>\S+))?.*?
(\s+REDEFINES\s+(?P<redefines>\S+))?.*?
(\s+PIC(TURE)?\s+(?P<pic>\S+))?.*?
(\s+OCCURS\s+(?P<occurs>\d+).?( TIMES)?)?.*?
((?P<comp>)\s+COMP\S+)?.*?
(\s+VALUE\s+(?P<value>\S+).*)?
\.$
以下是适用于除最后一行之外的所有行的文本示例。第二个最后一行未能找到 pic 组匹配,因为发生组先前已经(ahem)发生在字符串中。
03 AMOUNT-BREAKDOWN PICTURE 9(8)V99 VALUE ZEROES.
03 AMOUNT-BREAKDOWN-X REDEFINES AMOUNT-BREAKDOWN.
05 FILLER PICTURE X(3) VALUE "DEC".
03 MONTH REDEFINES MONTH-TAB PICTURE X(3) OCCURS 12 TIMES.
03 SUB PICTURE 99 VALUE 0.
03 NUMBER-HOLD.
05 NUMB-HOLD PICTURE X OCCURS 11 TIMES.
05 FILLER PICTURE X(5) VALUE "TEN".
03 DIGIT-TAB2 REDEFINES DIGIT-TAB1.
05 DIGIT-TABLE OCCURS 10 PICTURE X(5).
03 WK-TEN-MILLION PICTURE X(5) VALUE SPACES.
我在使用正则表达式时会遇到困难,但我认为我冒着混乱的风险,因为我遗漏了一些基本的东西。
要明确:带有PICTURE语句的所有行都被 pic 组捕获,但最后一行除外,因为它出现在发生捕获组之后。
任何帮助表示感谢。
答案 0 :(得分:1)
PyParsing(https://github.com/pyparsing/pyparsing)是轻松构建语法的好模块。您可以构建基本的Copybook语法,然后使用PyParsing对其进行解析。然后,您必须发布流程以保留由两位级别字段表示的树状结构。
还要看看使用PyParsing的Copybook包(https://github.com/zalmane/copybook)。
答案 1 :(得分:0)
虽然像PLY或parsely这样的实际解析器最适合这个,如果你必须使用正则表达式,你不能只添加另一个具有不同键的OCCURS组吗? e.g。
"""
03 AMOUNT-BREAKDOWN PICTURE 9(8)V99 VALUE ZEROES.
03 AMOUNT-BREAKDOWN-X REDEFINES AMOUNT-BREAKDOWN.
05 FILLER PICTURE X(3) VALUE "DEC".
03 MONTH REDEFINES MONTH-TAB PICTURE X(3) OCCURS 12 TIMES.
03 SUB PICTURE 99 VALUE 0.
03 NUMBER-HOLD.
05 NUMB-HOLD PICTURE X OCCURS 11 TIMES.
05 FILLER PICTURE X(5) VALUE "TEN".
03 DIGIT-TAB2 REDEFINES DIGIT-TAB1.
05 DIGIT-TABLE OCCURS 10 PICTURE X(5).
03 WK-TEN-MILLION PICTURE X(5) VALUE SPACES.
"""
import re
for line in __doc__.split("\n"):
if len(line) < 1: continue
m = re.match(
"^(?P<level>\d{2})\s+(?P<name>\S+).*?"
"(\s+INDEXED BY\s+(?P<indexed_by>\S+))?.*?"
"(\s+REDEFINES\s+(?P<redefines>\S+))?.*?"
"(\s+OCCURS\s+(?P<occurs1>\d+).?( TIMES)?)?.*?" # <-- occurs1
"(\s+PIC(TURE)?\s+(?P<pic>\S+))?.*?"
"(\s+OCCURS\s+(?P<occurs>\d+).?( TIMES)?)?.*?"
"((?P<comp>)\s+COMP\S+)?.*?"
"(\s+VALUE\s+(?P<value>\S+).*)?"
"\.$", line)
if m:
print m.groups()
示例输出:
('03', 'AMOUNT-BREAKDOWN', None, None, None, None, None, None, None, ' PICTURE 9(8)V99', 'TURE', '9(8)V99', None, None, None, None, None, ' VALUE ZEROES', 'ZEROES')
('03', 'AMOUNT-BREAKDOWN-X', None, None, ' REDEFINES AMOUNT-BREAKDOWN', 'AMOUNT-BREAKDOWN', None, None, None, None, None, None, None, None, None, None, None, None, None)
('05', 'FILLER', None, None, None, None, None, None, None, ' PICTURE X(3)', 'TURE', 'X(3)', None, None, None, None, None, ' VALUE "DEC"', '"DEC"')
('03', 'MONTH', None, None, ' REDEFINES MONTH-TAB', 'MONTH-TAB', None, None, None, ' PICTURE X(3)', 'TURE', 'X(3)', ' OCCURS 12 ', '12', None, None, None, None, None)
('03', 'SUB', None, None, None, None, None, None, None, ' PICTURE 99', 'TURE', '99', None, None, None, None, None, ' VALUE 0', '0')
('03', 'NUMBER-HOLD', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None)
('05', 'NUMB-HOLD', None, None, None, None, None, None, None, ' PICTURE X', 'TURE', 'X', ' OCCURS 11 ', '11', None, None, None, None, None)
('05', 'FILLER', None, None, None, None, None, None, None, ' PICTURE X(5)', 'TURE', 'X(5)', None, None, None, None, None, ' VALUE "TEN"', '"TEN"')
('03', 'DIGIT-TAB2', None, None, ' REDEFINES DIGIT-TAB1', 'DIGIT-TAB1', None, None, None, None, None, None, None, None, None, None, None, None, None)
('05', 'DIGIT-TABLE', None, None, None, None, ' OCCURS 10 ', '10', None, ' PICTURE X(5)', 'TURE', 'X(5)', None, None, None, None, None, None, None)
('03', 'WK-TEN-MILLION', None, None, None, None, None, None, None, ' PICTURE X(5)', 'TURE', 'X(5)', None, None, None, None, None, ' VALUE SPACES', 'SPACES')
答案 2 :(得分:0)
你应该看看cb2xml。它将解析Cobol Copybook并创建一个Xml文件。然后,您可以在python中处理Xml 或任何语言。 cb2xml 包具有以python +其他语言处理Xml的基本示例。
的Cobol:
01 Ams-Vendor.
03 Brand Pic x(3).
03 Location-details.
05 Location-Number Pic 9(4).
05 Location-Type Pic XX.
05 Location-Name Pic X(35).
03 Address-Details.
05 actual-address.
10 Address-1 Pic X(40).
10 Address-2 Pic X(40).
10 Address-3 Pic X(35).
05 Postcode Pic 9(4).
05 Empty pic x(6).
05 State Pic XXX.
03 Location-Active Pic X.
cb2xml的输出:
?xml version="1.0" encoding="UTF-8" standalone="no"?>
<copybook filename="cbl2xml_Test110.cbl">
<item display-length="173" level="01" name="Ams-Vendor" position="1" storage-length="173">
<item display-length="3" level="03" name="Brand" picture="x(3)" position="1" storage-length="3"/>
<item display-length="41" level="03" name="Location-details" position="4" storage-length="41">
<item display-length="4" level="05" name="Location-Number" numeric="true" picture="9(4)" position="4" storage-length="4"/>
<item display-length="2" level="05" name="Location-Type" picture="XX" position="8" storage-length="2"/>
<item display-length="35" level="05" name="Location-Name" picture="X(35)" position="10" storage-length="35"/>
</item>
<item display-length="128" level="03" name="Address-Details" position="45" storage-length="128">
<item display-length="115" level="05" name="actual-address" position="45" storage-length="115">
<item display-length="40" level="10" name="Address-1" picture="X(40)" position="45" storage-length="40"/>
<item display-length="40" level="10" name="Address-2" picture="X(40)" position="85" storage-length="40"/>
<item display-length="35" level="10" name="Address-3" picture="X(35)" position="125" storage-length="35"/>
</item>
<item display-length="4" level="05" name="Postcode" numeric="true" picture="9(4)" position="160" storage-length="4"/>
<item display-length="6" level="05" name="Empty" picture="x(6)" position="164" storage-length="6"/>
<item display-length="3" level="05" name="State" picture="XXX" position="170" storage-length="3"/>
</item>
<item display-length="1" level="03" name="Location-Active" picture="X" position="173" storage-length="1"/>
</item>
</copybook>
Dynamically Reading COBOL Redefines with C#
中描述了 cb2xml 的一个有趣应用CobolToCsv包会将Cobol-Data-File转换为Csv文件。限制:
Cobol2Csv 应该能够处理文本文件(+ Comp-3)。它可能会处理你的一些文件。