折叠数组中连续的“相同”元素

时间:2014-02-10 20:09:17

标签: ruby arrays

目标

给定一系列元素和标准来确定两个元素是否为“相同”,返回一个新数组,其中已删除连续“相同”元素的运行以仅留下端点。例如:

a = [ {k:'a',v:1}, {k:'b',v:1}, {k:'c',v:1},
      {k:'d',v:2}, {k:'e',v:2},
      {k:'f',v:3}, {k:'g',v:3}, {k:'h',v:3}, {k:'i',v:3}, {k:'j',v:3},
      {k:'k',v:2},
      {k:'l',v:4}, {k:'m',v:4}, {k:'n',v:4}, {k:'o',v:4} ]
b = a.collapse_consecutive{ |h| h[:v] }
#=> [ {k:'a',v:1}, {k:'c',v:1},
#=>   {k:'d',v:2}, {k:'e',v:2},
#=>   {k:'f',v:3}, {k:'j',v:3},
#=>   {k:'k',v:2},
#=>   {k:'l',v:4}, {k:'o',v:4} ]

动机

在折线图上绘制 n 点时,除了终点外,一系列连续的相同值结果对图形没有影响。在下图中,黑色样本对最终图表没有影响。我正在存储精细采样的图形,理想情况下要删除所有不相关的样本。

Line graph with inflection points represented by orange dots and several points along linear sections represented by black dots (on both horizontal and angled linear sections)

对于这个问题,我简化问题只是去除水平部分上的黑点,因为识别沿着有角度的线性部分的点是(a)更难和(b)更罕见(在我的情况下)。

当前进展

到目前为止,我提出的最好的解决方案依赖于数组索引:

class Array
  def collapse_consecutive
    select.with_index{ |o,i|
      i==0 || yield(self[i-1])!=yield(o) ||
      !self[i+1] || yield(self[i+1])!=yield(o)
    end
  end
end

这很有效,但是依赖于Ruby中的数组索引通常是“代码味道”:表明存在更优雅的实现。

4 个答案:

答案 0 :(得分:4)

这是一个简化任何2D线的通用解决方案,在移除之前应该有一个角度应该有多大的自定义阈值。假设使用相同的比例显示X和Y值。

# Assumes that points is an array of two-valued arrays representing
# [x,y] pairs; removes points whose angle is less than the threshold
def simplify_line(points,inflection_threshold_degrees=1)
  points.reject.with_index do |p1,i|
    if i>0 && (p0=points[i-1]) && (p2=points[i+1])
      # http://stackoverflow.com/a/7505937/405017
      p0p1 = (p1[0]-p0[0])**2 + (p1[1]-p0[1])**2
      p2p1 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
      p0p2 = (p2[0]-p0[0])**2 + (p2[1]-p0[1])**2
      angle = Math.acos( (p2p1+p0p1-p0p2) / Math.sqrt(4*p2p1*p0p1) )*180/Math::PI
      (180 - angle).abs < inflection_threshold_degrees
    end
  end
end

与上述问题中显示的图表中的点一起使用:

pts = [ [  0,40],[ 20,80],[ 40,90],[ 60,80],[ 80,80],[100,60],[120,60],
        [140,60],[160,60],[180,60],[200,80],[220,80],[240,80],[260,60],
        [280,40],[300,33],[320,27],[340,20],[360,20] ]

我们得到了很好的结果:

Line graphs with points removed between 10° and 60°, showing increasingly simplified versions that retain the end points, ultimately resulting in a single line.

答案 1 :(得分:2)

使用Enumerable#chunkEnumerable#flat_map

a.chunk { |h| h[:v] }.flat_map { |c| [c[1][0],c[1][-1]].uniq }
  

[{:k =&gt;“a”,:v =&gt; 1},{:k =&gt;“c”,:v =&gt; 1},{:k =&gt;“d”, :v =&gt; 2},{:k =&gt;“e”,:v =&gt; 2},{:k =&gt;“f”,:v =&gt; 3},{:k =&gt; “j”,:v =&gt; 3},{:k =&gt;“k”,:v =&gt; 2},{:k =&gt;“l”,:v =&gt; 4},{: k =&gt;“o”,:v =&gt; 4}]

答案 2 :(得分:2)

另一种变化。

class Array
  def collapse_consecutive &pr
    chunk(&pr).map(&:last).flat_map{|a| a - a[1..-2]}
  end
end

答案 3 :(得分:1)

无法放弃使用Enumerator#peek的机会:

<强>代码

def doit(a)
  e = a.to_enum
  b = [e.next]
  (a.size-2).times do
    f = e.next
    b << f unless (f[:v] == b.last[:v] && f[:v] == e.peek[:v])
  end
  b << e.next
end  

试一试

a = [ {k:'a',v:1}, {k:'b',v:1}, {k:'c',v:1},
      {k:'d',v:2}, {k:'e',v:2},
      {k:'f',v:3}, {k:'g',v:3}, {k:'h',v:3}, {k:'i',v:3}, {k:'j',v:3},
      {k:'k',v:2},
      {k:'l',v:4}, {k:'m',v:4}, {k:'n',v:4}, {k:'o',v:4}
    ]

doit(a)
  #=> [{:k=>"a", :v=>1}, {:k=>"c", :v=>1},
  #    {:k=>"d", :v=>2}, {:k=>"e", :v=>2},
  #    {:k=>"f", :v=>3}, {:k=>"j", :v=>3},
  #    {:k=>"k", :v=>2},
  #    {:k=>"l", :v=>4}, {:k=>"o", :v=>4}]