Question

我需要一个可以用来解析包含空格分隔单词的任何给定输入字符串的正则表达式或类似规则，以创建通常更长的输出字符串，其中某些部分根据特定条件进行扩展。我可以创建从头开始执行此操作的代码，但我想知道我是否可能不需要这样做，因为这看起来不是一件小事。

在下面的例子中，我将使用'abc etc ...'来表示单词，这可以很容易地'蛋糕14小时等......'例如然而'abc等......'更容易使用描述规则应该如何工作。我还使用特殊字符{，}，[，|和]。在这样做时，我并没有提到这些角色可能具有的正则表达式含义。

我还将包含示例中不应存在的换行符，以使事情更具可读性。

规则将指定输入字符串中出现的{}附件内的所有内容都不会在输出字符串中显示。 {}的内容将发生在同一个地方，但重复多次由它的[]附件定义。

1

请注意，'b'和'c'用'|'分隔。

{a [b | c]}

应该成为：

a b
a c

2

请注意，'b'和'c'在一起并与'd'分开。 {}附件包含两个[]，第一个包含两个元素，第二个包含3个元素，共计6个。

{[a b | c][d | e | f]}

应该成为：

a b d
a b e
a b f
c d
c e
c f

3

现在是一个更为复杂的例子。

{a [b c | d] e f [g | h | i]} j

应该成为：

a b c e f g
a b c e f h
a b c e f i
a d e f g
a d e f h
a d e f i
j

没有换行符，应该是：

a b c e f g a b c e f h a b c e f i a d e f g a d e f h a d e f i j

以下是DR Seuss的两个更具体的例子，其中添加了换行符以便于阅读，第二个示例从原始文本中进行了大量编辑：

输入：

{I do not like [them in a box | them with a fox | them in a house
| them with a mouse | them here or there | them anywhere | green
eggs and ham | them, Sam-I-am].}

输出：

I do not like them in a box.
I do not like them with a fox.
I do not like them in a house.
I do mot like them with a mouse.
I do not like them here or there.
I do not like them anywhere.
I do not like green eggs and ham.
I do not like them, Sam-I-am.

输入：

{[Would | could] you} ? {Would you [like | eat] them
[in a house | with a mouse]?}

输出：

Would you, could you?

Would you like them in a house?
Would you like them with a mouse?

Would you eat them in a house?
Would you eat them with a mouse?

理想情况下，{}附件应该能够堆叠。这些示例都没有显示堆叠{}附件。

我已经可以从他们的数字（第一个，第二个等等）或其他标签中引用单个单词，这比例如查找单个字母更容易因为我存储文本而忽略整个输入。

Answer 1

您可能需要查看Boost.Spirit.Qi。据我所知，你可以解析表达式并将其表示为DAG，例如这个（只是花括号中的部分）：

然后你只需要通过该DAG生成每条可能的路径。

Answer 2

如果你使用Oniguruma库，你可以使用这样的命名捕获：

^\{(?<a>\w+)\s+\[(?<b>\w+)\s+
(?<c>\w+)\s+
\|\s+
(?<d>\w+)\]\s+
(?<e>\w+)\s+
(?<f>\w+)\s+
\[(?<g>\w+)\s+
\|\s+
(?<h>\w+)\s+
\|\s+
(?<i>\w+)\]\}\s+
(?<j>\w+)\s*$

它可能有用吗？

我不使用C ++，但我使用Ruby并使用Oniguruma正则表达式库。以下是我将如何在Ruby中使用上述正则表达式（来自Interactive Ruby Shell“irb”）：

s = "{let [them all | eat] as much [cake | as | they]} want"
r = %r!
^\{(?<a>\w+)\s+\[(?<b>\w+)\s+
(?<c>\w+)\s+
\|\s+
(?<d>\w+)\]\s+
(?<e>\w+)\s+
(?<f>\w+)\s+
\[(?<g>\w+)\s+
\|\s+
(?<h>\w+)\s+
\|\s+
(?<i>\w+)\]\}\s+
(?<j>\w+)\s*$
!x

m = r.match s
=> #<MatchData "{let [them all | eat] as much [cake | as | they]} want" a:"let" b:"them" c:"all" d:"eat" e:"as" f:"much" g:"cake" h:"as" i:"they" j:"want">
m[:j]
=> "want"
m[:b]
=> "them"

希望有所帮助。我还调整/修复？上面的正则表达式。

m[:a] + " " + m[:b]
=> "let them"

m[0]
=> "{let [them all | eat] as much [cake | as | they]} want"

所以现在您可以根据自己的喜好操纵结果。或者，编号的捕获仍然有效：

m[1] + " " + m[2]
=> "let them"

Answer 3

正则表达式可能没有帮助，其他人可能会有所帮助，但大部分工作仍需要自己完成。

我可以用正则表达式做到这一点吗？

3 个答案: