Question

我正在学习Ruby。作为我的硬件的一部分，我要查找字符串中连续出现两个重复字符的第一次出现，然后返回重复的字符。这是我想出的：

merge(df1, df2, by = c("ID", "Timestamp"), copy = TRUE, all.y = T) %>%
 mutate(ACTIVITY2 = case_when(is.na(ACTIVITY) ~ lag(ACTIVITY),
                              TRUE ~ ACTIVITY))

问题：这是最好的方法吗？也许是因为我还在学习，但是我觉得这不是他们所要求的，但这是我根据所做的研究知道有效的方法。有理由不对此类使用数组吗？

Answer 1

为什么不只使用简单的正则表达式？

str = 'abccdd'
str[/(.)\1/][0]
=> 'c'

此处的正则表达式将每个字符分组并找到第一对连续字符。然后，我们只需调用0索引来获取第一个字符。

在ruby中，有几种在字符串上使用Regular Expression的方法。因此，您可以使用此方法。

def find_first_dup_in_string(str)
  str[/(.)\1/][0] 
end

这是tadman答案的一个变体，我将包含基准以比较UPDATED以根据评论使用each_char。

def find_first_dup_a(str)
  d = ''
  str.each_char.each_cons(2){|c| d = c[0]; break if c[0] == c[1] }
  d
end

alpha=[*'a'..'z']
str = ''
1000.times{ str << alpha.sample}

cycles = 100000

Benchmark.bm do |x|
  x.report(:ruby) {  cycles.times { find_first_dup_a(str) } }
  x.report(:regex) { cycles.times { find_first_dup_in_string(str) } }
end

ruby  0.330000   0.010000   0.340000 (  0.338940)
regex  0.140000   0.000000   0.140000 (  0.151719)
=> [
    [0] #<Benchmark::Tms:0x00007fb6a0bd4c88 @label="ruby", @real=0.33893999992869794, @cstime=0.0, @cutime=0.0, @stime=0.010000000000000009, @utime=0.33000000000000007, @total=0.3400000000000001>,
    [1] #<Benchmark::Tms:0x00007fb6a2601390 @label="regex", @real=0.1517189999576658, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.14000000000000057, @total=0.14000000000000057>
]

这是一个有趣的巧合，无关紧要：）

14.0/33.0 * 100
=> 42.42424242424242

Answer 2

在Ruby中，字符串可以转换为字符数组，然后您可以从中获得各种乐趣：

def duup?(str)
  !!str.chars.each_cons(2).find { |a,b| a == b }
end

仅使用each_cons（每个连续的）迭代器，并发现两个字母的第一个实例相同。

如果这还不够令人兴奋：

def duup?(str)
  !!str.chars.each_cons(2).lazy.map(&:uniq).map(&:length).include?(1)
end

在这种情况下，将每一对简化为唯一的元素，并寻找折叠成长度为1的数组的元素。lazy被丢掉了。

您还可以做一些晦涩的事情，例如：

def duup?(str)
 !!(1...str.length).find { |i| str[i].ord ^ str[i-1].ord == 0 }
end

如果您喜欢二进制数学，则两个值相同时，异或会返回零，因为它们会相互抵消。

或者为了简单起见：

def duup?(str)
  !!str.chars.each_cons(2).find { |v| v == v.reverse }
end

如果反向集与正向集相同，则它必须是两个相同的东西。

请注意，由于2是完全任意的，因此其中某些字符可以轻松扩展到 N 个字符。

作为练习，您可能需要benchmark这些长度可变的字符串的例程。在大型字符串上某些方法可能不可行。

返回字符串中的重复字符

2 个答案: