COBOL程序用于文件编码转换

时间:2016-09-26 14:57:26

标签: cobol

我需要将文本文件从utf8转换为cp1251。我不能使用任何第三方软件。是否有任何用COBOL编写的例程?它是Windows上的Micro Focus Cobol。

2 个答案:

答案 0 :(得分:5)

答案:为此编写了很多COBOL例程...

我不知道任何免费(开放源代码可以自由使用它),但您可以轻松自己编写。 只需通过源并将其移动到目标,如果cp1251中没有该符号,请使用'?'管他呢。 这里唯一的工作是:你需要从x' 80'中查找128个字符。以上......

或者您检查MF是否有某些特定的扩展名,或者您自己编写。 没有"请为我编码#34;在SO,所以你应该展示你已经尝试过的东西。

为了让您了解this javascript sample的转换,应该是(未经测试的代码):

       77  utf-8-field     PIC X(5000).
       77  new-char        PIC X.
       77  cp1251-field    PIC X(5000).
       77  utf-8-pos       PIC 9(04) COMP-5.
       77  cp1251-pos      PIC 9(04) COMP-5.
       77  utf-8-end       PIC 9(04) COMP-5.

       MOVE FUNCTION LENGTH ( FUNCTION TRIM (utf-8-field TRAILING) )
         TO utf-8-end
       MOVE 1 TO cp1251-pos
       PERFORM VARYING utf-8-pos FROM 1 BY 1
               UNTIL   utf-8-pos = utf-8-end
          EVALUATE TRUE
             *> normal ASCII character
             WHEN utf-8-field (utf-8-pos) < x'80'
                MOVE utf-8-field (utf-8-pos) TO new-char
             *> UTF-8 in CP1251 range
             WHEN utf-8-field (utf-8-pos) < x'04'
                *> skip the first byte
                ADD 1 TO utf-8-pos
                EVALUATE TRUE
                   WHEN utf-8-pos > utf-8-end
                      MOVE '?'   TO new-char
                   WHEN utf-8-field (utf-8-pos)  = x'51'
                      MOVE x'B8' TO new-char
                   WHEN utf-8-field (utf-8-pos) >= x'4F'
                      MOVE '?'   TO new-char
                   *> alternative: use alphabet conversion here
                   WHEN utf-8-field (utf-8-pos)  = x'01'
                      MOVE x'A8' TO new-char
                   WHEN OTHER
                      MOVE utf-8-field (utf-8-pos) TO new-char
                      INSPECT new-char CONVERTING x'0203 ...
                                       TO         x'B2B2 ...
                END-EVALUATE
             *> UTF-8 with no CP1251 char 
             *> Todo: check for other multibyte headers and add the correct
             *>       number of characters to utf-8-pos
             *> WHEN ...
             WHEN OTHER
                MOVE '?' TO new-char
          END-EVALUATE
          STRING new-char
                 DELIMITED BY SIZE
                 INTO cp1251-field
                 WITH POINTER cp1251-pos
          END-STRING
       END-PERFORM

您可能需要为ALPHABET部分定义CONVERTING x'0203 ... TO x'B2B3 ...

       SPECIAL-NAMES.
          ALPHABET UTF8-PART-2 IS x'01', x'02' THRU x'4F', x'51'.
          ALPHABET CP1251      IS x'A8', x'B2' THRU x'FF', x'B8'.

并在内部EVALUATE使用

           MOVE utf-8-field (utf-8-pos) TO new-char
           INSPECT new-char CONVERTING UTF8-PART-2 TO CP1251

答案 1 :(得分:0)

你看过@ CBL_STRING_CONVERT吗?