我想出了一个用于解析GPG命令输出的正则表达式。
正则表达式:
static const uint32_t unicode[48] = {
0x0000, 0x0040, 0x0080, 0x00C0, 0x0100, 0x0140, 0x0180, 0x01C0, 0x0200, 0x0240, 0x0280, 0x02C0, 0x0300, 0x0340, 0x0380, 0x03C0,
0x0400, 0x0440, 0x0480, 0x04C0, 0x0500, 0x0540, 0x0580, 0x05C0, 0x0600, 0x0640, 0x0680, 0x06C0, 0x0700, 0x0740, 0x0780, 0x07C0,
0x0800, 0x1000, 0x2000, 0x3000, 0x4000, 0x5000, 0x6000, 0x7000, 0x8000, 0x9000, 0xA000, 0xB000, 0xC000, 0xD000, 0xE000, 0xF000,
};
...
FILE* fh = fopen("utf.txt", "r");
char* result;
char* tmpMemoryBuffer;
size_t currentSize = 255, currentIndex = 0;
result = (char*) malloc(sizeof(char) * currentSize);
memset(result, 0, sizeof(char) * currentSize);
if (fh != NULL)
{
uint8_t c2, c = (uint8_t) getc(fh);
uint32_t tmp = 0;
while (c != EOF && c != 0xFF)
{
if ((currentIndex - 1) == currentSize)
{
tmpMemoryBuffer = (char*) malloc(sizeof(char) * currentSize);
memcpy(tmpMemoryBuffer, result, sizeof(char) * currentSize);
result = (char*) realloc(result, sizeof(char) * (currentSize + 255));
memcpy(result, tmpMemoryBuffer, sizeof(char) * currentSize);
currentSize += 255;
}
if (c >= 0x20 && c <= 0x7E)
{
//Is normal char
printf("Normal:\t%c\n", c);
result[currentIndex++] = (char) c;
}
else if (c >= 0xC0 && c <= 0xEF && (c2 = (uint8_t) getc(fh)) != EOF)
{
//Is unicode
c &= 0x3F;
c2 &= 0x7F;
tmp = unicode[c];
tmp += c2;
sprintf(result + currentIndex, "\\u%04X", tmp);
currentIndex += 6;
printf("Unicode:\t%04X\n", tmp);
}
else
{
printf("Wrong format for 0x%X\n", c);
break;
}
c = (uint8_t) getc(fh);
}
result[currentIndex] = '\0';
fclose(fh);
...
free(result);
要匹配的文本:
^pub\s+(\S+)\s+(\S+)\s+.*\s+.{0,32}(.*)\s+(.*)<(\S+)>
当前输出:
pub dsa1024 2018-02-28 [SCA]
0019003A003E5A22E2337044D955066111F63B00
uid [ unknown] John Doe <jogn@doe.name>
sub elg1024 2018-02-28 [E]
问题:
如何仅使用正则表达式交换 Group 2 和 Group 3 ,所以 Group 2 的值为 11F63B00 和 Group 3 的值为 2018-02-28 。此外,我也想摆脱方括号( Group 4 )中的文本,包括方括号本身。
答案 0 :(得分:5)
捕获的字符串按照在模式中找到捕获的顺序返回。
可以使用先行更改顺序。
/
^ pub \s+ (\S+) \s+
(?= \S+ \s+ .* \s+ .{0,32}(.*) \s+ .* < \S+ > )
(\S+) \s+ .* \s+ .{0,32} .* \s+ (.*) <(\S+)>
/x
如果我们用行来表达,我们将得到以下结果:
/
^ pub \h++ (\S++) \h++ # Line 1 (part 1)
(?= .*+ \n # Line 1 (part 2)
\h*+ \S*(\S{8}) # Line 2
)
(\S++) .*+ \n # Line 1 (part 2)
.*+ \n # Line 2
(.*\S) \s++ <([^<>\s]++)> # Line 3
/x
(我也做到了,所以匹配失败出于习惯而无法更快地匹配。)
(如果可以接受,{\S{32}
会比\S*
快。)
(我也做到了,因此第四个捕获没有尾随空格。)
也就是说,一个更好的解决方案是在事实发生后修复订单。
@captures = @captures[0,2,1,3,4];
或
@captures[1,2] = @captures[2,1];
答案 1 :(得分:0)
如果您的数据位于out2 = zeros(N,N);
out2(:,1:size(out,2)) = out;
文件中
d