Question

考虑以下代码：

    lines = Array.new() 
    File.foreach('file.csv').with_index do |line, line_num|                 
      lines.push(line.split(" ")) if line_num > 0                                 
    end                                                                                  

    indices = lines.map { |el| el.last }                                          
    duplicates = indices.select{ |e| indices.count(e) > 2 }.uniq

所有想知道的人都可以看到以下示例CSV文件示例：

# Generated by tool XYZ
a b c 1
d e f 2
g h i 1
j k l 4
m n o 5
p q r 2
s t u 2
v w x 1
y z 0 5

是否可以将这两个方法块（最后两行代码）链接在一起？

Answer 1

如果您不希望使用中间变量，并且想在一行中使用它，则可以编写如下代码：

duplicates = lines.group_by(&:last).select{|k, v| v.count > 2}.keys

对于某些人来说，这可能会影响可读性！只是取决于你的口味。

Answer 2

O(N)解决方案（单遍）如下所示：

lines.each_with_object([[], []]) do |el, (result, temp)|
  (temp.delete(el) ? result : temp) << el
end.first

我们在这里使用中间语言

此外，您总是可以使用Object#tap：

duplicates =
  lines.map(&:last).tap do |indices|
    indices.select { |e| indices.count(e) > 2 }.uniq
  end

Answer 3

示例

让我们将代码应用于示例。

str =<<-END
Now is the
time for all
people who are
known to all
of us as the
best coders are
expected to
lead all
those who are
less experienced
to greatness
END

FName = 'temp'
File.write(FName, str)
  #=> 146

您的代码

lines = Array.new() 
File.foreach(FName).with_index do |line, line_num|                 
  lines.push(line.split(" ")) if line_num > 0                                 
end                                                                                  
lines
  #=> [["time", "for", "all"], ["people", "who", "are"], ["known", "to", "all"],
  #    ["of", "us", "as", "the"], ["best", "coders", "are"], ["expected", "to"],
  #    ["lead", "all"], ["those", "who", "are"], ["less", "experienced"],
  #    ["to", "greatness"]] 
indices = lines.map { |el| el.last }                                          
  #=> ["all", "are", "all", "the", "are", "to", "all", "are", "experienced", "greatness"] 
duplicates = indices.select { |e| indices.count(e) > 2 }
  #=> ["all", "are", "all", "are", "all", "are"] 
duplicates.uniq
  #=> ["all", "are"]

可以看到该对象返回一个数组，该数组包含出现在一行的最后一个单词（第一行除外）两次以上的所有单词。

更多类似Ruby的代码和更有效的代码

我们可以通过一次遍历文件来更简洁，更有效地完成此操作：

first_line = true
h = Hash.new(0)
File.foreach(FName) do |line|
  if first_line
    first_line = false
  else
    h[line[/\S+(?=\n)/]] += 1
  end
end
h.select { |_,count| count > 2 }.keys
  #=> ["all", "are"]

执行的步骤

步骤如下。

first_line = true
h = Hash.new(0)
File.foreach(FName) do |line|
  if first_line
    first_line = false
  else
    h[line[/\S+(?=\n)/]] += 1
  end
end
h #=> {"all"=>3, "are"=>3, "the"=>1, "to"=>1, "experienced"=>1, "greatness"=>1}
g = h.select { |_,count| count > 2 }
  #=> {"all"=>3, "are"=>3} 
g.keys
  #=> ["all", "are"]

使用Enumerator#each_object

它不是习惯在执行File.foreach(..)之前定义哈希，而是习惯使用方法Enumerator#each_object，该方法允许我们将构造的哈希链接到以下语句：

first_line = true
File.foreach(FName).with_object(Hash.new(0)) do |line, h|
  if first_line
    first_line = false
  else
    h[line[/\S+(?=\n)/]] += 1
  end
end.select { |_,count| count > 2 }.keys
  #=> ["all", "are"]

使用计数哈希

我将哈希定义如下。

h = Hash.new(0)

这使用Hash::new的形式定义了等于new参数的默认值。如果h = Hash.new(0)和h没有键k，则h[k]返回默认值零。 Ruby的解析器将表达式h[k] += 1扩展为：

h[k] = h[k] + 1

如果h没有键k，则表达式变为

h[k] = 0 + 1

请注意，h[k] = h[k] + 1的缩写为：

h.[]=(k, h.[](k) + 1)

默认为零的是方法Hash#[]，而不是方法Hash#[]=。

使用正则表达式提取每行的最后一个单词

其中一行是

str = "known to all\n"

我们可以使用正则表达式r = /\S+(?=\n)/提取最后一个单词：

str[r] #=> "all"

正则表达式读取“匹配一个或多个（+）字符，这些字符不是空格字符（\S），后跟换行符。(?=\n)是正向前进。"\n"必须匹配，因为它不属于返回的匹配项。

链接方法块（Ruby）

3 个答案: