Perl正则表达式交换匹配组

时间:2019-04-19 10:18:43

标签: regex perl

我想出了一个用于解析GPG命令输出的正则表达式。

正则表达式:

static const uint32_t unicode[48] = {
    0x0000, 0x0040, 0x0080, 0x00C0, 0x0100, 0x0140, 0x0180, 0x01C0, 0x0200, 0x0240, 0x0280, 0x02C0, 0x0300, 0x0340, 0x0380, 0x03C0, 
    0x0400, 0x0440, 0x0480, 0x04C0, 0x0500, 0x0540, 0x0580, 0x05C0, 0x0600, 0x0640, 0x0680, 0x06C0, 0x0700, 0x0740, 0x0780, 0x07C0, 
    0x0800, 0x1000, 0x2000, 0x3000, 0x4000, 0x5000, 0x6000, 0x7000, 0x8000, 0x9000, 0xA000, 0xB000, 0xC000, 0xD000, 0xE000, 0xF000, 
};

...

    FILE* fh = fopen("utf.txt", "r");
    char* result;
    char* tmpMemoryBuffer;
    size_t currentSize = 255, currentIndex = 0;
    result = (char*) malloc(sizeof(char) * currentSize);
    memset(result, 0, sizeof(char) * currentSize);


    if (fh != NULL)
    {
        uint8_t c2, c = (uint8_t) getc(fh);
        uint32_t tmp = 0;

        while (c != EOF && c != 0xFF)
        {
            if ((currentIndex - 1) == currentSize)
            {
                tmpMemoryBuffer = (char*) malloc(sizeof(char) * currentSize);
                memcpy(tmpMemoryBuffer, result, sizeof(char) * currentSize);
                result = (char*) realloc(result, sizeof(char) * (currentSize + 255));
                memcpy(result, tmpMemoryBuffer, sizeof(char) * currentSize);
                currentSize += 255;
            }

            if (c >= 0x20 && c <= 0x7E)
            {
                //Is normal char
                printf("Normal:\t%c\n", c);
                result[currentIndex++] = (char) c;
            }
            else if (c >= 0xC0 && c <= 0xEF && (c2 = (uint8_t) getc(fh)) != EOF)
            {
                //Is unicode
                c &= 0x3F;
                c2 &= 0x7F;
                tmp = unicode[c];
                tmp += c2;
                sprintf(result + currentIndex, "\\u%04X", tmp);
                currentIndex += 6;
                printf("Unicode:\t%04X\n", tmp);

            }
            else
            {
                printf("Wrong format for 0x%X\n", c);
                break;
            }
            c = (uint8_t) getc(fh);
        }

        result[currentIndex] = '\0';
        fclose(fh);
...
        free(result);     

要匹配的文本:

^pub\s+(\S+)\s+(\S+)\s+.*\s+.{0,32}(.*)\s+(.*)<(\S+)>

当前输出:

pub   dsa1024 2018-02-28 [SCA]
      0019003A003E5A22E2337044D955066111F63B00
uid           [ unknown] John Doe <jogn@doe.name>
sub   elg1024 2018-02-28 [E]


问题:

如何仅使用正则表达式交换 Group 2 Group 3 ,所以 Group 2 的值为 11F63B00 Group 3 的值为 2018-02-28 。此外,我也想摆脱方括号( Group 4 )中的文本,包括方括号本身。

Online example.

2 个答案:

答案 0 :(得分:5)

捕获的字符串按照在模式中找到捕获的顺序返回。

可以使用先行更改顺序。

/
   ^ pub \s+ (\S+) \s+ 
   (?=  \S+  \s+ .* \s+ .{0,32}(.*) \s+  .*  < \S+ > )
       (\S+) \s+ .* \s+ .{0,32} .*  \s+ (.*) <(\S+)>
/x

如果我们用行来表达,我们将得到以下结果:

/
   ^ pub \h++ (\S++) \h++       # Line 1 (part 1)
   (?= .*+ \n                   # Line 1 (part 2)
       \h*+ \S*(\S{8})          # Line 2
   )
   (\S++) .*+ \n                # Line 1 (part 2)
   .*+ \n                       # Line 2
   (.*\S) \s++ <([^<>\s]++)>    # Line 3
/x

(我也做到了,所以匹配失败出于习惯而无法更快地匹配。)

(如果可以接受,{\S{32}会比\S*快。)

(我也做到了,因此第四个捕获没有尾随空格。)


也就是说,一个更好的解决方案是在事实发生后修复订单。

@captures = @captures[0,2,1,3,4];

@captures[1,2] = @captures[2,1];

答案 1 :(得分:0)

如果您的数据位于out2 = zeros(N,N); out2(:,1:size(out,2)) = out; 文件中

d