Question

目前我按模式分割字符串，如下所示：

outcome_array=the_text.split(pattern_to_split_by)

问题是我分裂的模式本身总是被省略。

如何让它包含拆分模式本身？

Answer 1

感谢Mark Wilkins的不满，但这里有一小段代码：

irb(main):015:0> s = "split on the word on okay?"
=> "split on the word on okay?"
irb(main):016:0> b=[]; s.split(/(on)/).each_slice(2) { |s| b << s.join }; b
=> ["split on", " the word on", " okay?"]

或：

s.split(/(on)/).each_slice(2).map(&:join)

请参见下面的说明。

这是如何工作的。首先，我们将其分为“on”，但将其包含在括号中以使其成为匹配组。当正则表达式中的匹配组传递给split时，Ruby将在输出中包含该组：

s.split(/(on)/)
# => ["split", "on", "the word", "on", "okay?"

现在我们想要将“on”的每个实例与前面的字符串连接起来。 each_slice(2)通过一次将两个元素传递给它的块来帮助。我们只需调用each_slice(2)即可查看结果。由于each_slice在没有阻止的情况下调用时会返回一个枚举器，我们会将to_a应用于枚举器，这样我们就可以看到枚举器将枚举的内容：

s.split(/(on)/).each_slice(2).to_a
# => [["split", "on"], ["the word", "on"], ["okay?"]]

我们越来越近了。现在我们所要做的就是将这些词汇加在一起。这让我们得到了上面的完整解决方案。我将它打开成单独的行，以便更容易理解：

b = []
s.split(/(on)/).each_slice(2) do |s|
  b << s.join
end
b
# => ["split on", "the word on" "okay?"]

但是有一种很好的方法可以消除临时b并大大缩短代码：

s.split(/(on)/).each_slice(2).map do |a|
  a.join
end

map将其输入数组的每个元素传递给块;块的结果成为输出数组中该位置的新元素。在MRI＆gt; = 1.8.7中，您可以将其缩短到相当于：

s.split(/(on)/).each_slice(2).map(&:join)

Answer 2

您可以使用正则表达式断言来定位分割点而不消耗任何输入。下面使用一个积极的后视断言来分析＆＃39; on＆＃39;：

s = "split on the word on okay?"
s.split(/(?<=on)/)
=> ["split on", " the word on", " okay?"]

或者在＆＃39;

之前分开的正向前瞻

s = "split on the word on okay?"
s.split(/(?=on)/)
=> ["split ", "on the word ", "on okay?"]

有了这样的话，你可能想确保＆＃39; on＆＃39;不是一个更大的词的一部分（比如＆＃39;断言＆＃39;），也删除了分裂中的空格：

"don't split on assertion".split(/(?<=\bon\b)\s*/)
=> ["don't split on", "assertion"]

Answer 3

如果您使用带有组的模式，它也会返回结果中的模式：

irb(main):007:0> "split it here and here okay".split(/ (here) /)
=> ["split it", "here", "and", "here", "okay"]

编辑附加信息表明目标是将分割项目的一半包括在其中的项目。我认为有一种简单的方法可以做到这一点，但我不知道它，今天没时间玩它。因此，在没有聪明的解决方案的情况下，以下是暴力破解的一种方法。使用上述split方法在数组中包含拆分项。然后遍历数组并将每个第二个条目（根据定义，它是拆分值）与前一个条目组合在一起。

s = "split on the word on and include on with previous"
a = s.split(/(on)/)

# iterate through and combine adjacent items together and store
# results in a second array
b = []
a.each_index{ |i|
   b << a[i] if i.even?
   b[b.length - 1] += a[i] if i.odd?
   }

print b

结果如下：

["split on", " the word on", " and include on", " with previous"]

将字符串拆分为列表，但保持拆分模式

3 个答案: