如何在比较Cobol中的两个字符串时获取不同字符的索引?

时间:2013-09-17 12:37:59

标签: string compare cobol

假设有两个字符串 - PR-ACT-SOURCE-DETAIL-1和PR-ACT-SOURCE-DETAIL-2。我想比较这两个字符串,找出找到差异的位置。

我试图像这样处理方案 -

 PERFORM VARYING N FROM 1 BY 1 UNTIL N > 5000                                                                  
    IF PR-ACT-SOURCE-DETAIL-1 OF TRANSACTION-RECORD-1(N:1)   
        IS NOT EQUAL TO                                    
       PR-ACT-SOURCE-DETAIL-2 OF TRANSACTION-RECORD-2(N:1)  

        MOVE 'Y' TO WS-DIFF-FOUND   
        DISPLAY 'DIFFERENCE FOUND AT POSITION' N
    END-IF
 END-PERFORM

上面代码的问题是执行循环发生了5000次,如果我需要比较这样的10,000个字符串,那么执行时间就会变得太高。

还有其他方法可以做同样的事情,这需要较少的执行时间。

4 个答案:

答案 0 :(得分:1)

这里有三个可以减少程序整体运行时间的想法

第一个是在找到第一个差异时终止循环。您当前的 即使在确定之后,代码仍将继续运行整个变量 变量包含差异。如果你只需要知道 有差异,差异开始时你可以尝试以下方法:

 MOVE 'N' TO WS-DIFF-FOUND
 PERFORM VARYING N FROM 1 BY 1
           UNTIL N > LENGTH OF PR-ACT-SOURCE-DETAIL-1
              OR WS-DIFF-FOUND = 'Y'
     IF PR-ACT-SOURCE-DETAIL-1 (N:1) <> PR-ACT-SOURCE-DETAIL-2
        MOVE 'Y' TO WS-DIFF-FOUND
     END-IF
  END-PERFORM

  IF WS-DIFF-FOUND = 'Y'
     do whatever process you need to do
  END-IF

注意在上面我改变了硬编码变量长度(5000)来使用实际值 使用LENGTH OF特殊寄存器声明变量的长度。这样循环迭代器“自动”调整 如果你在将来的维护期间改变可变长度(少一点就会出错)。

如果您要比较的大多数数据实际上是相等的,那么差异是罕见的例外 你可能会先尝试对数据项进行直接的平等比较,然后再执行 如果发现差异,则逐字符测试。这可能会提供一些改进但需要 进行基准测试以验证它是否实际上是一种改进。有些编译器可能会生成非常有效的代码来执行此类操作 比较,其他人不会。试一试......

 IF PR-ACT-SOURCE-DETAIL-1 = PR-ACT-SOURCE-DETAIL-2
    MOVE 'N' TO WS-DIFF-FOUND
 ELSE
    use the PERFORM VAYRING loop shown above
 END-IF
 IF WS-DIFF-FOUND = 'Y'
 ...

最后的想法是查看N的声明,并确保为编译器使用最有效的数据类型。例如,如果N被声明为:

 01 N      PIC 9(7).

在使用上述内容递增和计算适当的偏移量时,编译器可能无法生成非常高效的代码。另一方面,像:

 01 N      PIC 9(9) BINARY.

可能会产生更有效的循环。这在很大程度上取决于您使用的编译器以及您提供的选项。有时,这些微小差异会对计划绩效产生重大影响。

答案 1 :(得分:0)

PERFORM 
  VARYING N  
   FROM 1 
   BY 1 
     UNTIL ( N GREATER THAN 5000 )
      OR ( byte-field-1 ( N : 1 ) 
          EQUAL TO byte-field-2 ( N : 1 ) )
END-PERFORM

EVALUATE TRUE
  WHEN N GREATER THAN 5000
    match
  WHEN N LESS THAN 5000
    no match
  WHEN OTHER
    IF ( byte-field-1 ( N : 1 ) 
         EQUAL TO byte-field-2 ( N : 1 ) )
        match
    ELSE
        no match
    END-IF
END-WHEN

一旦发现不匹配,这将停止搜索。

如果您的10,000个字符串中存在许多不匹配,这只能真正帮助您提高性能。

10,000 * 5,000只有50,000,000 - 为什么会出现这么大的问题?

如果您完整地描述了数据,可能还有其他解决方案。

你应该删除愚蠢的资格,为N设置一个好名字,并为5000的字段检查其值,该字段的长度是包含字符串的字段。

你真的有一个“字符串”,还是一块数据?与其他语言一样,COBOL中不存在字符串。

了解您的数据,描述您的数据,解释性能问题的原因。你使用哪种编译器和硬件?

我不确定标签字符串和比较对你有多大帮助。

答案 2 :(得分:0)

假设你的COBOL编译器很紧,你所拥有的是字符串比较的方式,逐字节。正如其他答案所提出的那样,如果你想要所有的差异,或者仅仅是字符串是不同的,以及第一个区别在哪里,这真的取决于。

我自己,我可能会让COBOL首先比较完整的字符串,然后只有在它们不相等的情况下才逐字节地进行。有可能,编译器的完整字符串比较代码比从手动方法收集的代码更紧密。

答案 3 :(得分:0)

以下程序使用两种不同的方法,每种方法有两种变体:

  1. 使用PIC 9作为索引顺序
  2. 使用PIC s9 COMP-5作为索引顺序
  3. 二进制搜索
  4. 使用较少循环进行二进制搜索
  5. 在最糟糕的情况下(最后一个字节的差异)

    • 由于使用原生数据项
    • ,方法2比方法1快约1.5倍
    • 方法3比方法1快约11倍

    注意:

    • 当差异在前250个字节中时,二进制搜索比顺序慢。
    • 方法3和4是等效的
    • 性能可能会受到COBOL运行时实现的影响。

    兼容性:

    • ANSI-85中定义的EXIT PERFORM语句。
    • COMP-5不是ANSI(但几乎所有编译器都支持),可以替换为代表BINARY-LONG的用法。

    代码:

       PROGRAM-ID. COMPSTR.
       WORKING-STORAGE SECTION.
       01 N PIC S9(9) COMP-5.
       01 N1 PIC 9(9).
       01 STRLEN PIC S9(9) COMP-5.
       01 CMPLEN PIC S9(9) COMP-5.
       01 CHUNK-BASE-OFFSET PIC S9(9) COMP-5.
       01 CHUNK-BASE-LENGTH PIC S9(9) COMP-5.
       01 CHUNK-OFFSET PIC S9(9) COMP-5.
       01 CHUNK-LENGTH PIC S9(9) COMP-5.
       01 STR1 PIC X(5000).
       01 STR2 PIC X(5000).
       01 WS-DIFF-FOUND PIC X.
    
       01 DIFF-TIME PIC 9(7)V99 COMP-5.
       01 EMPTY-PERFORM-TIME PIC 9(7)V99 COMP-5.
       78 LOOPS VALUE 10000.
       01 START-TIME.
          03 START-H PIC 99.
          03 START-M PIC 99.
          03 START-S PIC 99.
          03 START-T PIC 99.
       01 END-TIME.
          03 END-H PIC 99.
          03 END-M PIC 99.
          03 END-S PIC 99.
          03 END-T PIC 99.
       01 X PIC X.
       PROCEDURE DIVISION.
       MAIN-LOGIC.
           MOVE 5000 TO STRLEN
    
           ACCEPT START-TIME FROM TIME
           PERFORM LOOPS TIMES
               PERFORM EMPTY-PERFORM
           END-PERFORM
           ACCEPT END-TIME FROM TIME
           PERFORM TIME-DIFF
           MOVE DIFF-TIME TO EMPTY-PERFORM-TIME
           DISPLAY "EMPTY-PERFORM: " EMPTY-PERFORM-TIME
    
           MOVE ALL SPACES TO STR1 STR2
           MOVE "X" TO STR2(5000:1)
           PERFORM TEST-ALL
    
           MOVE ALL SPACES TO STR1 STR2
           MOVE "X" TO STR2(1:1)
           PERFORM TEST-ALL
    
           MOVE ALL SPACES TO STR1 STR2
           MOVE "X" TO STR2(2500:1)
           PERFORM TEST-ALL
    
           MOVE ALL SPACES TO STR1 STR2
           MOVE "X" TO STR2(250:1)
           PERFORM TEST-ALL
    
           ACCEPT X
           EXIT PROGRAM
           STOP RUN
           .
    
       TEST-ALL.
           ACCEPT START-TIME FROM TIME
           PERFORM LOOPS TIMES
               PERFORM COMPARE-1
           END-PERFORM
           ACCEPT END-TIME FROM TIME
           PERFORM TIME-DIFF
           DISPLAY "COMPARE-1: " DIFF-TIME " DIFFERENCE AT: " N1
    
           ACCEPT START-TIME FROM TIME
           PERFORM LOOPS TIMES
               PERFORM COMPARE-2
           END-PERFORM
           ACCEPT END-TIME FROM TIME
           PERFORM TIME-DIFF
           DISPLAY "COMPARE-2: " DIFF-TIME " DIFFERENCE AT: " N
    
           ACCEPT START-TIME FROM TIME
           PERFORM LOOPS TIMES
               PERFORM COMPARE-3
           END-PERFORM
           ACCEPT END-TIME FROM TIME
           PERFORM TIME-DIFF
           DISPLAY "COMPARE-3: " DIFF-TIME " DIFFERENCE AT: " N
    
           ACCEPT START-TIME FROM TIME
           PERFORM LOOPS TIMES
               PERFORM COMPARE-4
           END-PERFORM
           ACCEPT END-TIME FROM TIME
           PERFORM TIME-DIFF
           DISPLAY "COMPARE-4: " DIFF-TIME " DIFFERENCE AT: " N
           .
    
       EMPTY-PERFORM.
           .
    
       COMPARE-1.
           PERFORM VARYING N1 FROM 1 BY 1 UNTIL N1 > 5000                                                                  
               IF STR1(N1:1) IS NOT EQUAL TO STR2(N1:1)
                   MOVE 'Y' TO WS-DIFF-FOUND
                   EXIT PERFORM
               END-IF
           END-PERFORM
           .
    
       COMPARE-2.
           PERFORM VARYING N FROM 1 BY 1 UNTIL N > 5000                                                                  
               IF STR1(N:1) IS NOT EQUAL TO STR2(N:1)
                   MOVE 'Y' TO WS-DIFF-FOUND
                   EXIT PERFORM
               END-IF
           END-PERFORM
           .
    
       COMPARE-3.
           IF STR1 = STR2
               MOVE 0 TO N
           ELSE
               MOVE 1 TO CMPLEN
               PERFORM UNTIL CMPLEN >= STRLEN
                  COMPUTE CMPLEN = CMPLEN * 2
               END-PERFORM
               MOVE 1 TO CHUNK-BASE-OFFSET
               COMPUTE CHUNK-BASE-LENGTH = CMPLEN / 2
               PERFORM UNTIL 1 = 2
                   MOVE CHUNK-BASE-OFFSET TO CHUNK-OFFSET
                   MOVE CHUNK-BASE-LENGTH TO CHUNK-LENGTH
                   PERFORM 2 TIMES
                       IF CHUNK-OFFSET + CHUNK-LENGTH - 1 > STRLEN
                           COMPUTE CHUNK-LENGTH =
                                   STRLEN - CHUNK-OFFSET + 1
                       END-IF
                       IF STR1(CHUNK-OFFSET:CHUNK-LENGTH)
                            IS NOT EQUAL TO
                            STR2(CHUNK-OFFSET:CHUNK-LENGTH)
                           MOVE CHUNK-OFFSET TO CHUNK-BASE-OFFSET
                           COMPUTE CHUNK-BASE-LENGTH =
                                   CHUNK-BASE-LENGTH / 2
                           EXIT PERFORM
                       ELSE
                           ADD CHUNK-LENGTH TO CHUNK-OFFSET
                       END-IF
                   END-PERFORM
                   IF CHUNK-BASE-LENGTH = 0
                       EXIT PERFORM
                   END-IF
               END-PERFORM
               MOVE CHUNK-OFFSET TO N
           END-IF
           .
    
       COMPARE-4.
           IF STR1 = STR2
               MOVE 0 TO N
           ELSE
               MOVE 1 TO CMPLEN
               PERFORM UNTIL CMPLEN >= STRLEN
                  COMPUTE CMPLEN = CMPLEN * 2
               END-PERFORM
               MOVE 1 TO CHUNK-BASE-OFFSET
               COMPUTE CHUNK-BASE-LENGTH = CMPLEN / 2
               PERFORM UNTIL 1 = 2
                   MOVE CHUNK-BASE-OFFSET TO CHUNK-OFFSET
                   MOVE CHUNK-BASE-LENGTH TO CHUNK-LENGTH
                   PERFORM 2 TIMES
                       IF CHUNK-OFFSET + CHUNK-LENGTH - 1 > STRLEN
                           COMPUTE CHUNK-LENGTH =
                                   STRLEN - CHUNK-OFFSET + 1
                       END-IF
                       IF STR1(CHUNK-OFFSET:CHUNK-LENGTH)
                            IS NOT EQUAL TO
                            STR2(CHUNK-OFFSET:CHUNK-LENGTH)
                           MOVE CHUNK-OFFSET TO CHUNK-BASE-OFFSET
                           PERFORM UNTIL CHUNK-BASE-LENGTH <
                                         CHUNK-LENGTH
                               COMPUTE CHUNK-BASE-LENGTH =
                                       CHUNK-BASE-LENGTH / 2
                           END-PERFORM
                           EXIT PERFORM
                       ELSE
                           ADD CHUNK-LENGTH TO CHUNK-OFFSET
                       END-IF
                   END-PERFORM
                   IF CHUNK-BASE-LENGTH = 0
                       EXIT PERFORM
                   END-IF
               END-PERFORM
               MOVE CHUNK-OFFSET TO N
           END-IF
           .
    
       TIME-DIFF.
           COMPUTE DIFF-TIME = (END-H - START-H) * 3600 +
                               (END-M - START-M) * 60 +
                               (END-S - START-S) +
                               (END-T - START-T) / 100
           .