在COBOL格式化程序中忽略引号内的逗号

时间:2015-04-22 12:08:38

标签: csv cobol zos

我的办公室使用内部工具,可以从DDL生成COBOL格式化程序,COBOL文件到DB2加载程序和其他相关对象。 通常,我们加载的文件用' |'分隔。但是新数据源仅发送用逗号分隔的文件。

我遇到的问题是某些文本字段中包含逗号,格式化程序在读取输入数据中的记录后执行的第一件事是运行检查/计数以检查分隔符的正确数量记录。如果计算的分隔符太多,则丢弃该记录。当逗号位于文本字段中时,它会使记录失败,因为它无法区分字段中的逗号和分隔符。

幸运的是,所有文本字段都用引号"包围,所以我打算写一些代码,一次检查一个记录的每个字符,保持一个引号,如果它遇到一个逗号和引用计数是一个奇数,它将忽略逗号而不计入记录。

有关如何执行此操作的任何建议吗?

2 个答案:

答案 0 :(得分:2)

这假设您知道线路上有多少数据(最大可变长度),并且您将用线路的最大长度替换OCCURS 1000

这个想法是使用开关开关。 EVALUATE的第一件事是检查报价。如果找到,请翻转开关。接下来是说如果开关打开,则忽略该字节。接下来,如果是逗号,请计算它。

PERFORM完成后,计数将包含非引号限制逗号的总数。

我选择的数据名称是为了说明该技术。您可以将这些更改为与您的任务相关。

01  length-of-data-on-the-line 
                              COMP PIC 9(4).

01  the-line.
    05  FILLER OCCURS 1000 TIMES.
        10  character-on-the-line  PIC X.
            88  cotl-is-comma      VALUE COMMA.
            88  cotl-is-quote      VALUE QUOTE.

01  FILLER.
    05  FILLER                     PIC X.
        88  on-off-switch-on       VALUE "1".
        88  on-off-switch-off      VALUE "7".

01  the-count                 COMP PIC 9(4).
01  data-on-line-sub          COMP PIC 9(4).


MOVE ZERO                   TO the-count
                               data-on-line-sub
SET on-off-switch-off       TO TRUE
PERFORM 
  length-of-data-on-the-line TIMES
    ADD 1                   TO data-on-line-sub
    EVALUATE TRUE
      WHEN cotl-is-quote ( data-on-line-sub )
        IF on-off-switch-off
            SET on-off-switch-on
                             TO TRUE
        ELSE
            SET on-off-switch-off
                             TO TRUE
        END-IF
      WHEN on-off-switch-on
        CONTINUE
      WHEN cotl-is-comma ( data-on-line-sub )
        ADD 1                TO the-count
    END-EVALUATE
END-PEFORM

答案 1 :(得分:0)

尽管超出要求,此程序仍应处理大多数CSV记录。未使用制表符分隔符对其进行测试。该程序将CSV文本(除去添加的引号)从由选定的定界符分隔为由LOW-VALUES分隔。通过UNSTRING ... DELIMITED LOW-VALUES INTO ...可以更轻松地分隔字段。

   IDENTIFICATION DIVISION.
   PROGRAM-ID. CSV2STR.
   DATA DIVISION.
   WORKING-STORAGE SECTION.
   01  I COMP PIC 9(4).
   01  J COMP PIC 9(4).
   01  FLD-START COMP PIC 9(4).
   01  STATE COMP PIC 9(4).
   01  FLD-SEP PIC X VALUE LOW-VALUES.
   01  QUOT PIC X VALUE """".
   01  APOS PIC X VALUE "'".
   01  COMM PIC X VALUE ",".
   LINKAGE SECTION.
   01  INPUT-REC PIC X(2000).
   01  INPUT-LENGTH COMP PIC 9(4).
   01  OUTPUT-REC PIC X(2000).
   01  OUTPUT-LENGTH COMP PIC 9(4).
   01  DELIM PIC X.
   PROCEDURE DIVISION USING INPUT-REC INPUT-LENGTH
       OUTPUT-REC OUTPUT-LENGTH DELIM.
   BEGIN.
       IF INPUT-LENGTH = 0 OR > 2000
           MOVE 0 TO OUTPUT-LENGTH
           EXIT PROGRAM
       END-IF
       IF DELIM NOT = SPACE
           MOVE DELIM TO COMM
       ELSE
           MOVE "," TO COMM
       END-IF
       PERFORM CONVERT-RECORD
       SUBTRACT 1 FROM J GIVING OUTPUT-LENGTH
       EXIT PROGRAM
       .

   CONVERT-RECORD.
       MOVE 1 TO STATE I J FLD-START
       PERFORM CONVERT-RECORD-PROC
           UNTIL I > INPUT-LENGTH
       MOVE FLD-SEP TO OUTPUT-REC (J:1)
  *> FOR NO FIELD AFTER THE LAST DELIMITER
       IF INPUT-REC (I - 1:1) = COMM
           ADD 1 TO J
           MOVE FLD-SEP TO OUTPUT-REC (J:1)
       END-IF
       .

   CONVERT-RECORD-PROC.
  *> CSV-DT
       EVALUATE STATE
           ALSO I = FLD-START
           ALSO INPUT-REC (I:1)
           ALSO INPUT-REC (I + 1:1)
  *> RULE   1 DETERMINES IF FIELD BEGINS WITH QUOTE
       WHEN 1 ALSO TRUE ALSO QUOT ALSO ANY
           MOVE 2 TO STATE
           ADD 1 TO I
  *> RULE   2 SPECIAL CASE OF SPACE + APOSTROPHE AT FIELD START
       WHEN 1 ALSO TRUE ALSO SPACE ALSO APOS
           ADD 1 TO I
  *> RULE   3 COPIES ONE CHARACTER
       WHEN 1 ALSO ANY ALSO NOT COMM ALSO ANY
           MOVE INPUT-REC (I:1) TO OUTPUT-REC (J:1)
           ADD 1 TO I
           ADD 1 TO J
  *> RULE   4 ENDS A FIELD
       WHEN 1 ALSO ANY ALSO COMM ALSO ANY
           MOVE FLD-SEP TO OUTPUT-REC (J:1)
           ADD 1 TO I
           ADD 1 TO J
           MOVE I TO FLD-START
  *> RULE   5 FOR QUOTED FIELD DROPS INITAL QUOTE
       WHEN 2 ALSO ANY ALSO NOT QUOT ALSO ANY
           MOVE INPUT-REC (I:1) TO OUTPUT-REC (J:1)
           ADD 1 TO I
           ADD 1 TO J
  *> RULE   6 FOR QUOTED FIELD CONVERTS TWO QUOTED TO ONE
       WHEN 2 ALSO ANY ALSO QUOT ALSO QUOT
           MOVE QUOTE TO OUTPUT-REC (J:1)
           ADD 2 TO I
           ADD 1 TO J
  *> RULE   7 FOR QUOTED FIELD DROPS QUOTE BEFORE DELIMITER
       WHEN 2 ALSO ANY ALSO QUOT ALSO COMM
           MOVE FLD-SEP TO OUTPUT-REC (J:1)
           ADD 2 TO I
           ADD 1 TO J
           MOVE I TO FLD-START
           MOVE 1 TO STATE
  *> RULE   8 FOR QUOTED FIELD DROPS FINAL QUOTE OF LAST FIELD
       WHEN 2 ALSO ANY ALSO QUOT ALSO SPACE
           ADD 2 TO I
           ADD 1 TO J
           MOVE I TO FLD-START
           MOVE 1 TO STATE
       END-EVALUATE
       .

已测试以下CSV文件:

120,ABC,123,"12"" RULER","""ABC"", ""DEF"", ""GHI""", 'ABC',"123,456"
"""789""",,,,"""mno""",,

测试程序输出:

1: 120
2: ABC
3: 123
4: 12" RULER
5: "ABC", "DEF", "GHI"
6: 'ABC'
7: 123,456
1: "789"
2:
3:
4:
5: "mno"
6:
7: