Question

我想查找字符串的结尾是否与单独字符串的开头重叠。例如，如果我有以下两个字符串：

string_1 = 'People say nothing is impossible, but I'
string_2 = 'but I do nothing every day.'

我如何发现"but I"末尾的string_1部分与string_2的开头相同？

我可以编写一个方法来遍历两个字符串，但我希望得到一个答案，该答案包含我错过的Ruby字符串方法或Ruby惯用语。

Answer 1

将MARKER设置为一个永远不会出现在string_1和string_2中的字符串。有多种方法可以动态地执行此操作，但是我认为您可以根据自己的情况提出一些固定的字符串。我认为：

MARKER = "@@@"

为了您的情况安全起见。根据您的用例进行更改。然后，

string_1 = 'People say nothing is impossible, but I'
string_2 = 'but I do nothing every day.'
(string_1 + MARKER + string_2).match?(/(.+)#{MARKER}\1/) # => true

string_1 = 'People say nothing is impossible, but I'
string_2 = 'but you do nothing every day.'
(string_1 + MARKER + string_2).match?(/(.+)#{MARKER}\1/) # => false

Answer 2

这是一种解决方案，它通过将string_1的结尾与string_2的开头（以最大的公共长度作为起点）进行比较，并至少包含一个匹配字符。如果找到任何匹配的字符，则返回索引（从string_1的末尾或string_2的开头），该索引可用于提取匹配部分。

class String
  def oindex(other)
    [length, other.length].min.downto(1).detect do |i|
      end_with?(other[0, i])
    end
  end
end

string_1 = 'People say nothing is impossible, but I'
string_2 = 'but I do nothing every day.'

if (idx = string_1.oindex(string_2))
  puts "Last #{idx} characters match: #{string_1[-idx..-1]}"
end

这里是一种替代方法，它找到字符串中另一个字符串的第一个字符的所有索引，并将这些索引用作检查匹配项的起点：

class String
  def each_index(other)
    return enum_for(__callee__, other) unless block_given?

    i = -1
    yield i while i = index(other, i.succ)
  end

  def oindex(other)
    each_index(other.chr).detect do |i|
      other.start_with?(self[i..-1]) and break length - i
    end
  end
end

这应该比检查每个索引更有效，尤其是在具有较短匹配项的较长字符串上，但我尚未对其进行基准测试。

Answer 3

您可以使用一个简单的循环并在最后进行测试：

a=string_1.split(/\b/)
idx=0

while (idx<=a.length) do
   break if string_2.start_with?(a[idx..-1].join)
   idx+=1
end

p a[idx..-1].join if idx<a.length

由于此操作从0开始，因此找到了最长的子字符串重叠。

您可以在同一数组的.detect块中使用相同的逻辑：

> a[(0..a.length).detect { |idx| string_2.start_with?(a[idx..-1].join) }..-1].join
=> "but I"

或者，正如注释中指出的，您可以使用字符串vs数组

string_1[(0..string_1.length).detect { |idx| string_2.start_with?(string_1[idx..-1]) }..-1]

Answer 4

有两种方法可以做到这一点。首先将两个字符串转换为数组，然后比较这些数组中的序列。第二个直接对两个字符串进行操作，比较子字符串。

＃1将字符串转换为数组并比较这些数组中的序列

这是一个简单的替代方法，要求将字符串转换为单词数组。假定所有单词对都用一个空格隔开。

def begins_with_ends?(end_str, begin_str)
  end_arr = end_str.split
  begin_arr = begin_str.split
  !!begin_arr.each_index.find { |i| begin_arr[0,i+1] == end_arr[-1-i..-1] }
end

!!obj在“ falsy”（obj或false）下将nil转换为false，在“ true”时将其转换为true （不是“虚假”）。例如，!!3 #=> true和!!nil #=> false。

end_str   = 'People say nothing is impossible, but I when I'
begin_str = 'but I when I do nothing every day.'
begins_with_ends?(end_str, begin_str)
  #=> true

这里是"I"中第二个单词begin_str上的匹配项。但是，end_str的最后一个单词通常只（最多）匹配begin_str

中的单个单词

＃2比较子字符串

我已经实现了以下算法。

将start_search设置为0。
尝试匹配end_str中target的最后一个单词（值begin_str），从偏移量start_search开始。如果找不到匹配项，则返回false;否则，将idx设为start_str的索引，其中target的最后一个字符出现。
如果由true的前idx个字符组成的字符串等于由begin_str的后idx个字符组成的字符串，则返回end_str；否则设置start_search = idx + 2并重复步骤2。

def begins_with_ends?(end_str, begin_str)
  target = end_str[/[[:alnum:]]+\z/]
  start_idx = 0
  loop do
    idx = begin_str.index(/\b#{target}\b/, start_idx)
    return false if idx.nil?
    idx += target.size
    return true if end_str[-idx..-1] == begin_str[0, idx]
    start_idx = idx + 2
  end
end

begins_with_ends?(end_str, begin_str)
  #=> true

此方法可识别两个字符串中相同两个单词之间的空格数量不同（在这种情况下，不会匹配）。

Answer 5

也许这样的东西可以满足您的需求？

string_1.split(' ') -  string_2.split(' ')
=> ["People", "say", "is", "impossible,"]

或者这更令人费解，但是会给您确切的重叠：

string_2.
  chars.
  each_with_index.
  map { |_, i| string_1.match(string_2[0..i]) }.
  select { |s| s }.
  max { |x| x.length }.
  to_s
=> "but I"

确定字符串的结尾是否与单独的字符串的开头重叠

5 个答案: