假设有两个字符串 - PR-ACT-SOURCE-DETAIL-1和PR-ACT-SOURCE-DETAIL-2。我想比较这两个字符串,找出找到差异的位置。
我试图像这样处理方案 -
PERFORM VARYING N FROM 1 BY 1 UNTIL N > 5000
IF PR-ACT-SOURCE-DETAIL-1 OF TRANSACTION-RECORD-1(N:1)
IS NOT EQUAL TO
PR-ACT-SOURCE-DETAIL-2 OF TRANSACTION-RECORD-2(N:1)
MOVE 'Y' TO WS-DIFF-FOUND
DISPLAY 'DIFFERENCE FOUND AT POSITION' N
END-IF
END-PERFORM
上面代码的问题是执行循环发生了5000次,如果我需要比较这样的10,000个字符串,那么执行时间就会变得太高。
还有其他方法可以做同样的事情,这需要较少的执行时间。
答案 0 :(得分:1)
这里有三个可以减少程序整体运行时间的想法
第一个是在找到第一个差异时终止循环。您当前的 即使在确定之后,代码仍将继续运行整个变量 变量包含差异。如果你只需要知道 有差异,差异开始时你可以尝试以下方法:
MOVE 'N' TO WS-DIFF-FOUND
PERFORM VARYING N FROM 1 BY 1
UNTIL N > LENGTH OF PR-ACT-SOURCE-DETAIL-1
OR WS-DIFF-FOUND = 'Y'
IF PR-ACT-SOURCE-DETAIL-1 (N:1) <> PR-ACT-SOURCE-DETAIL-2
MOVE 'Y' TO WS-DIFF-FOUND
END-IF
END-PERFORM
IF WS-DIFF-FOUND = 'Y'
do whatever process you need to do
END-IF
注意在上面我改变了硬编码变量长度(5000)来使用实际值
使用LENGTH OF
特殊寄存器声明变量的长度。这样循环迭代器“自动”调整
如果你在将来的维护期间改变可变长度(少一点就会出错)。
如果您要比较的大多数数据实际上是相等的,那么差异是罕见的例外 你可能会先尝试对数据项进行直接的平等比较,然后再执行 如果发现差异,则逐字符测试。这可能会提供一些改进但需要 进行基准测试以验证它是否实际上是一种改进。有些编译器可能会生成非常有效的代码来执行此类操作 比较,其他人不会。试一试......
IF PR-ACT-SOURCE-DETAIL-1 = PR-ACT-SOURCE-DETAIL-2
MOVE 'N' TO WS-DIFF-FOUND
ELSE
use the PERFORM VAYRING loop shown above
END-IF
IF WS-DIFF-FOUND = 'Y'
...
最后的想法是查看N
的声明,并确保为编译器使用最有效的数据类型。例如,如果N
被声明为:
01 N PIC 9(7).
在使用上述内容递增和计算适当的偏移量时,编译器可能无法生成非常高效的代码。另一方面,像:
01 N PIC 9(9) BINARY.
可能会产生更有效的循环。这在很大程度上取决于您使用的编译器以及您提供的选项。有时,这些微小差异会对计划绩效产生重大影响。
答案 1 :(得分:0)
PERFORM
VARYING N
FROM 1
BY 1
UNTIL ( N GREATER THAN 5000 )
OR ( byte-field-1 ( N : 1 )
EQUAL TO byte-field-2 ( N : 1 ) )
END-PERFORM
EVALUATE TRUE
WHEN N GREATER THAN 5000
match
WHEN N LESS THAN 5000
no match
WHEN OTHER
IF ( byte-field-1 ( N : 1 )
EQUAL TO byte-field-2 ( N : 1 ) )
match
ELSE
no match
END-IF
END-WHEN
一旦发现不匹配,这将停止搜索。
如果您的10,000个字符串中存在许多不匹配,这只能真正帮助您提高性能。
10,000 * 5,000只有50,000,000 - 为什么会出现这么大的问题?
如果您完整地描述了数据,可能还有其他解决方案。
你应该删除愚蠢的资格,为N设置一个好名字,并为5000的字段检查其值,该字段的长度是包含字符串的字段。
你真的有一个“字符串”,还是一块数据?与其他语言一样,COBOL中不存在字符串。
了解您的数据,描述您的数据,解释性能问题的原因。你使用哪种编译器和硬件?
我不确定标签字符串和比较对你有多大帮助。
答案 2 :(得分:0)
假设你的COBOL编译器很紧,你所拥有的是字符串比较的方式,逐字节。正如其他答案所提出的那样,如果你想要所有的差异,或者仅仅是字符串是不同的,以及第一个区别在哪里,这真的取决于。
我自己,我可能会让COBOL首先比较完整的字符串,然后只有在它们不相等的情况下才逐字节地进行。有可能,编译器的完整字符串比较代码比从手动方法收集的代码更紧密。
答案 3 :(得分:0)
以下程序使用两种不同的方法,每种方法有两种变体:
在最糟糕的情况下(最后一个字节的差异)
注意:
兼容性:
代码:
PROGRAM-ID. COMPSTR.
WORKING-STORAGE SECTION.
01 N PIC S9(9) COMP-5.
01 N1 PIC 9(9).
01 STRLEN PIC S9(9) COMP-5.
01 CMPLEN PIC S9(9) COMP-5.
01 CHUNK-BASE-OFFSET PIC S9(9) COMP-5.
01 CHUNK-BASE-LENGTH PIC S9(9) COMP-5.
01 CHUNK-OFFSET PIC S9(9) COMP-5.
01 CHUNK-LENGTH PIC S9(9) COMP-5.
01 STR1 PIC X(5000).
01 STR2 PIC X(5000).
01 WS-DIFF-FOUND PIC X.
01 DIFF-TIME PIC 9(7)V99 COMP-5.
01 EMPTY-PERFORM-TIME PIC 9(7)V99 COMP-5.
78 LOOPS VALUE 10000.
01 START-TIME.
03 START-H PIC 99.
03 START-M PIC 99.
03 START-S PIC 99.
03 START-T PIC 99.
01 END-TIME.
03 END-H PIC 99.
03 END-M PIC 99.
03 END-S PIC 99.
03 END-T PIC 99.
01 X PIC X.
PROCEDURE DIVISION.
MAIN-LOGIC.
MOVE 5000 TO STRLEN
ACCEPT START-TIME FROM TIME
PERFORM LOOPS TIMES
PERFORM EMPTY-PERFORM
END-PERFORM
ACCEPT END-TIME FROM TIME
PERFORM TIME-DIFF
MOVE DIFF-TIME TO EMPTY-PERFORM-TIME
DISPLAY "EMPTY-PERFORM: " EMPTY-PERFORM-TIME
MOVE ALL SPACES TO STR1 STR2
MOVE "X" TO STR2(5000:1)
PERFORM TEST-ALL
MOVE ALL SPACES TO STR1 STR2
MOVE "X" TO STR2(1:1)
PERFORM TEST-ALL
MOVE ALL SPACES TO STR1 STR2
MOVE "X" TO STR2(2500:1)
PERFORM TEST-ALL
MOVE ALL SPACES TO STR1 STR2
MOVE "X" TO STR2(250:1)
PERFORM TEST-ALL
ACCEPT X
EXIT PROGRAM
STOP RUN
.
TEST-ALL.
ACCEPT START-TIME FROM TIME
PERFORM LOOPS TIMES
PERFORM COMPARE-1
END-PERFORM
ACCEPT END-TIME FROM TIME
PERFORM TIME-DIFF
DISPLAY "COMPARE-1: " DIFF-TIME " DIFFERENCE AT: " N1
ACCEPT START-TIME FROM TIME
PERFORM LOOPS TIMES
PERFORM COMPARE-2
END-PERFORM
ACCEPT END-TIME FROM TIME
PERFORM TIME-DIFF
DISPLAY "COMPARE-2: " DIFF-TIME " DIFFERENCE AT: " N
ACCEPT START-TIME FROM TIME
PERFORM LOOPS TIMES
PERFORM COMPARE-3
END-PERFORM
ACCEPT END-TIME FROM TIME
PERFORM TIME-DIFF
DISPLAY "COMPARE-3: " DIFF-TIME " DIFFERENCE AT: " N
ACCEPT START-TIME FROM TIME
PERFORM LOOPS TIMES
PERFORM COMPARE-4
END-PERFORM
ACCEPT END-TIME FROM TIME
PERFORM TIME-DIFF
DISPLAY "COMPARE-4: " DIFF-TIME " DIFFERENCE AT: " N
.
EMPTY-PERFORM.
.
COMPARE-1.
PERFORM VARYING N1 FROM 1 BY 1 UNTIL N1 > 5000
IF STR1(N1:1) IS NOT EQUAL TO STR2(N1:1)
MOVE 'Y' TO WS-DIFF-FOUND
EXIT PERFORM
END-IF
END-PERFORM
.
COMPARE-2.
PERFORM VARYING N FROM 1 BY 1 UNTIL N > 5000
IF STR1(N:1) IS NOT EQUAL TO STR2(N:1)
MOVE 'Y' TO WS-DIFF-FOUND
EXIT PERFORM
END-IF
END-PERFORM
.
COMPARE-3.
IF STR1 = STR2
MOVE 0 TO N
ELSE
MOVE 1 TO CMPLEN
PERFORM UNTIL CMPLEN >= STRLEN
COMPUTE CMPLEN = CMPLEN * 2
END-PERFORM
MOVE 1 TO CHUNK-BASE-OFFSET
COMPUTE CHUNK-BASE-LENGTH = CMPLEN / 2
PERFORM UNTIL 1 = 2
MOVE CHUNK-BASE-OFFSET TO CHUNK-OFFSET
MOVE CHUNK-BASE-LENGTH TO CHUNK-LENGTH
PERFORM 2 TIMES
IF CHUNK-OFFSET + CHUNK-LENGTH - 1 > STRLEN
COMPUTE CHUNK-LENGTH =
STRLEN - CHUNK-OFFSET + 1
END-IF
IF STR1(CHUNK-OFFSET:CHUNK-LENGTH)
IS NOT EQUAL TO
STR2(CHUNK-OFFSET:CHUNK-LENGTH)
MOVE CHUNK-OFFSET TO CHUNK-BASE-OFFSET
COMPUTE CHUNK-BASE-LENGTH =
CHUNK-BASE-LENGTH / 2
EXIT PERFORM
ELSE
ADD CHUNK-LENGTH TO CHUNK-OFFSET
END-IF
END-PERFORM
IF CHUNK-BASE-LENGTH = 0
EXIT PERFORM
END-IF
END-PERFORM
MOVE CHUNK-OFFSET TO N
END-IF
.
COMPARE-4.
IF STR1 = STR2
MOVE 0 TO N
ELSE
MOVE 1 TO CMPLEN
PERFORM UNTIL CMPLEN >= STRLEN
COMPUTE CMPLEN = CMPLEN * 2
END-PERFORM
MOVE 1 TO CHUNK-BASE-OFFSET
COMPUTE CHUNK-BASE-LENGTH = CMPLEN / 2
PERFORM UNTIL 1 = 2
MOVE CHUNK-BASE-OFFSET TO CHUNK-OFFSET
MOVE CHUNK-BASE-LENGTH TO CHUNK-LENGTH
PERFORM 2 TIMES
IF CHUNK-OFFSET + CHUNK-LENGTH - 1 > STRLEN
COMPUTE CHUNK-LENGTH =
STRLEN - CHUNK-OFFSET + 1
END-IF
IF STR1(CHUNK-OFFSET:CHUNK-LENGTH)
IS NOT EQUAL TO
STR2(CHUNK-OFFSET:CHUNK-LENGTH)
MOVE CHUNK-OFFSET TO CHUNK-BASE-OFFSET
PERFORM UNTIL CHUNK-BASE-LENGTH <
CHUNK-LENGTH
COMPUTE CHUNK-BASE-LENGTH =
CHUNK-BASE-LENGTH / 2
END-PERFORM
EXIT PERFORM
ELSE
ADD CHUNK-LENGTH TO CHUNK-OFFSET
END-IF
END-PERFORM
IF CHUNK-BASE-LENGTH = 0
EXIT PERFORM
END-IF
END-PERFORM
MOVE CHUNK-OFFSET TO N
END-IF
.
TIME-DIFF.
COMPUTE DIFF-TIME = (END-H - START-H) * 3600 +
(END-M - START-M) * 60 +
(END-S - START-S) +
(END-T - START-T) / 100
.