Question

我试图捕获包含字母，数字和％符号后面的下划线的每个文本实例，只要它不是%%或转义％。我写了以下正则表达式：

((?<!(?:\\|%))%[a-zA-Z0-9_]+)

我想将以这种方式捕获的所有内容存储到关联数组中，因此我编写了以下函数来执行此操作：

string[string] make_symbol_table(string input) {
  string[string] symbol_table;
  auto m = matchAll(input, regex(r"((?<!(?:\\|%))%[a-zA-Z0-9_]+)", "g")).captures();
  for (auto i = 1; i < m.length; i++) {
    symbol_table[m[i]] = null;
  }
  return symbol_table;
}

并根据以下输入进行测试：

This is an ordinary %template, with a few well-situated %template_arguments. It uses a range of characters, mostly to ensure that %template1 works correctly.\n\nYou can even start %1template with a number! We can also have some silly cases: %_ %1, %a, and so on. %%DIRECTIVES should never be captured, nor should escaped \\% or \\%\\%. %CAPS or %CaPs are fine too.

我将其作为转义字符串写入。我假设这将给我9场比赛（由this确认），但由于某种原因，我只得到1！我正确使用matchAll吗？

Answer 1

.captures与.front相同，即第一场比赛。你对所有比赛都感兴趣。所以放弃.captures。

然后，m是一个RegexMatch，它没有.length。只需foreach：foreach(match; m)。

match是Captures，即完整匹配和所有子匹配的范围。您对完整比赛*感兴趣。因此，请使用match.front或match[0]获取字符串：symbol_table[match.front] = null;

*或第一个子匹配 - 它们是相同的，因为整个事情是括号内的

也许这可以帮助澄清一些事情：

matchAll(...)会返回一系列匹配项：
- 第一场比赛是完整比赛和子比赛的范围：
  - 完全匹配：＆＃34;％template＆＃34;
  - 第一个子匹配：＆＃34;％template＆＃34;
- 第二场比赛同上：
  - 完全匹配：＆＃34;％template_arguments＆＃34;
  - 第一个子匹配：＆＃34;％template_arguments＆＃34;
- 第三场比赛同上：
  - 完全匹配：＆＃34;％template1＆＃34;
  - 第一个子匹配：＆＃34;％template1＆＃34;
- ...

Answer 2

    (?<!\\)(?<!%)%[a-zA-Z0-9_]+

这在Python中对我有用。

使用D捕获和存储正则表达式的所有匹配项

2 个答案: