我的办公室使用内部工具,可以从DDL生成COBOL格式化程序,COBOL文件到DB2加载程序和其他相关对象。 通常,我们加载的文件用' |'分隔。但是新数据源仅发送用逗号分隔的文件。
我遇到的问题是某些文本字段中包含逗号,格式化程序在读取输入数据中的记录后执行的第一件事是运行检查/计数以检查分隔符的正确数量记录。如果计算的分隔符太多,则丢弃该记录。当逗号位于文本字段中时,它会使记录失败,因为它无法区分字段中的逗号和分隔符。
幸运的是,所有文本字段都用引号"
包围,所以我打算写一些代码,一次检查一个记录的每个字符,保持一个引号,如果它遇到一个逗号和引用计数是一个奇数,它将忽略逗号而不计入记录。
有关如何执行此操作的任何建议吗?
答案 0 :(得分:2)
这假设您知道线路上有多少数据(最大可变长度),并且您将用线路的最大长度替换OCCURS 1000
这个想法是使用开关开关。 EVALUATE的第一件事是检查报价。如果找到,请翻转开关。接下来是说如果开关打开,则忽略该字节。接下来,如果是逗号,请计算它。
PERFORM完成后,计数将包含非引号限制逗号的总数。
我选择的数据名称是为了说明该技术。您可以将这些更改为与您的任务相关。
01 length-of-data-on-the-line
COMP PIC 9(4).
01 the-line.
05 FILLER OCCURS 1000 TIMES.
10 character-on-the-line PIC X.
88 cotl-is-comma VALUE COMMA.
88 cotl-is-quote VALUE QUOTE.
01 FILLER.
05 FILLER PIC X.
88 on-off-switch-on VALUE "1".
88 on-off-switch-off VALUE "7".
01 the-count COMP PIC 9(4).
01 data-on-line-sub COMP PIC 9(4).
MOVE ZERO TO the-count
data-on-line-sub
SET on-off-switch-off TO TRUE
PERFORM
length-of-data-on-the-line TIMES
ADD 1 TO data-on-line-sub
EVALUATE TRUE
WHEN cotl-is-quote ( data-on-line-sub )
IF on-off-switch-off
SET on-off-switch-on
TO TRUE
ELSE
SET on-off-switch-off
TO TRUE
END-IF
WHEN on-off-switch-on
CONTINUE
WHEN cotl-is-comma ( data-on-line-sub )
ADD 1 TO the-count
END-EVALUATE
END-PEFORM
答案 1 :(得分:0)
尽管超出要求,此程序仍应处理大多数CSV记录。未使用制表符分隔符对其进行测试。该程序将CSV文本(除去添加的引号)从由选定的定界符分隔为由LOW-VALUES
分隔。通过UNSTRING ... DELIMITED LOW-VALUES INTO ...
可以更轻松地分隔字段。
IDENTIFICATION DIVISION.
PROGRAM-ID. CSV2STR.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 I COMP PIC 9(4).
01 J COMP PIC 9(4).
01 FLD-START COMP PIC 9(4).
01 STATE COMP PIC 9(4).
01 FLD-SEP PIC X VALUE LOW-VALUES.
01 QUOT PIC X VALUE """".
01 APOS PIC X VALUE "'".
01 COMM PIC X VALUE ",".
LINKAGE SECTION.
01 INPUT-REC PIC X(2000).
01 INPUT-LENGTH COMP PIC 9(4).
01 OUTPUT-REC PIC X(2000).
01 OUTPUT-LENGTH COMP PIC 9(4).
01 DELIM PIC X.
PROCEDURE DIVISION USING INPUT-REC INPUT-LENGTH
OUTPUT-REC OUTPUT-LENGTH DELIM.
BEGIN.
IF INPUT-LENGTH = 0 OR > 2000
MOVE 0 TO OUTPUT-LENGTH
EXIT PROGRAM
END-IF
IF DELIM NOT = SPACE
MOVE DELIM TO COMM
ELSE
MOVE "," TO COMM
END-IF
PERFORM CONVERT-RECORD
SUBTRACT 1 FROM J GIVING OUTPUT-LENGTH
EXIT PROGRAM
.
CONVERT-RECORD.
MOVE 1 TO STATE I J FLD-START
PERFORM CONVERT-RECORD-PROC
UNTIL I > INPUT-LENGTH
MOVE FLD-SEP TO OUTPUT-REC (J:1)
*> FOR NO FIELD AFTER THE LAST DELIMITER
IF INPUT-REC (I - 1:1) = COMM
ADD 1 TO J
MOVE FLD-SEP TO OUTPUT-REC (J:1)
END-IF
.
CONVERT-RECORD-PROC.
*> CSV-DT
EVALUATE STATE
ALSO I = FLD-START
ALSO INPUT-REC (I:1)
ALSO INPUT-REC (I + 1:1)
*> RULE 1 DETERMINES IF FIELD BEGINS WITH QUOTE
WHEN 1 ALSO TRUE ALSO QUOT ALSO ANY
MOVE 2 TO STATE
ADD 1 TO I
*> RULE 2 SPECIAL CASE OF SPACE + APOSTROPHE AT FIELD START
WHEN 1 ALSO TRUE ALSO SPACE ALSO APOS
ADD 1 TO I
*> RULE 3 COPIES ONE CHARACTER
WHEN 1 ALSO ANY ALSO NOT COMM ALSO ANY
MOVE INPUT-REC (I:1) TO OUTPUT-REC (J:1)
ADD 1 TO I
ADD 1 TO J
*> RULE 4 ENDS A FIELD
WHEN 1 ALSO ANY ALSO COMM ALSO ANY
MOVE FLD-SEP TO OUTPUT-REC (J:1)
ADD 1 TO I
ADD 1 TO J
MOVE I TO FLD-START
*> RULE 5 FOR QUOTED FIELD DROPS INITAL QUOTE
WHEN 2 ALSO ANY ALSO NOT QUOT ALSO ANY
MOVE INPUT-REC (I:1) TO OUTPUT-REC (J:1)
ADD 1 TO I
ADD 1 TO J
*> RULE 6 FOR QUOTED FIELD CONVERTS TWO QUOTED TO ONE
WHEN 2 ALSO ANY ALSO QUOT ALSO QUOT
MOVE QUOTE TO OUTPUT-REC (J:1)
ADD 2 TO I
ADD 1 TO J
*> RULE 7 FOR QUOTED FIELD DROPS QUOTE BEFORE DELIMITER
WHEN 2 ALSO ANY ALSO QUOT ALSO COMM
MOVE FLD-SEP TO OUTPUT-REC (J:1)
ADD 2 TO I
ADD 1 TO J
MOVE I TO FLD-START
MOVE 1 TO STATE
*> RULE 8 FOR QUOTED FIELD DROPS FINAL QUOTE OF LAST FIELD
WHEN 2 ALSO ANY ALSO QUOT ALSO SPACE
ADD 2 TO I
ADD 1 TO J
MOVE I TO FLD-START
MOVE 1 TO STATE
END-EVALUATE
.
已测试以下CSV文件:
120,ABC,123,"12"" RULER","""ABC"", ""DEF"", ""GHI""", 'ABC',"123,456"
"""789""",,,,"""mno""",,
测试程序输出:
1: 120
2: ABC
3: 123
4: 12" RULER
5: "ABC", "DEF", "GHI"
6: 'ABC'
7: 123,456
1: "789"
2:
3:
4:
5: "mno"
6:
7: