Ruby:如何排序解析内容的字符串数组

时间:2015-04-24 16:49:09

标签: ruby arrays string parsing sorting

这是我的问题:我有一个字符串数组,其中包含如下数据:

array = ["{109}{08} OK",
         "{98} Thx",
         "{108}{0.8}{908} aa",
         "{8}{51} lorem ipsum"]

我想对此数组进行排序扫描“数据内部”:这里是大括号中的整数。所以,最终的数组应该是这样的:

array.custom_sort! => ["{8}{51} lorem ipsum",
                       "{98} Thx",
                       "{108}{0.8}{908} aa",
                       "{109}{08} OK"]

在Ruby中有一个很好的解决方案吗?或者我应该重新创建一个插入每个已解析元素的新数组吗?

编辑:

我没有提到排序优先顺序: 首先,排序是基于括号中的数字,最多3组,但不能缺席。

["{5}something",
 "{61}{64}could",
 "{}be",                  #raise an error or ignore it
 "{54}{31.24}{0.2}write",
 "{11}{21}{87}{65}here",  #raise an error or ignore it
 "[]or",                  #raise an error or ignore it
 "{31}not"]

如果第一个数字相等,则应比较第二个数字。 一些例子:

"{15}" < "{151}" < "{151}{32}" < "{152}"
"{1}" < "{012}" < "{12}{-1}{0}" < "{12.0}{0.2}"
"{5}" < "{5}{0}" < "{5}{0}{1}"

但是如果每个数字都是等于,则比较字符串。产生问题的唯一字符是空格,它必须在每个其他“可见”字符之后。 例子:

"{1}a" < "{1}aa" < "{1} a" < "{1}  a"
"{1}" < "{1}a " < "{1}a  " < "{1}a  a"
"{1}a" < "{1}ba" < "{1}b "

我可以在自定义类中做这样的事情:

class CustomArray
  attr_accessor :one
  attr_accessor :two
  attr_accessor :three
  attr_accessor :text 

  def <=>(other)
    if self.one.to_f < other.one.to_f
      return -1
    elsif self.one.to_f > other.one.to_f
      return 1
    elsif self.two.nil?
      if other.two.nil?
        min = [self.text, other.text].min
        i = 0
        until i == min
          if self.text[i].chr == ' ' #.chr is for compatibility with Ruby 1.8.x
            if other.text[i].chr != ' '
              return 1
            end
          else
            if other.text[i].chr == ' '
              return -1

          #...

    self.text <=> other.text
  end
end

它工作正常,但我在Ruby中编码非常沮丧,就像我在C ++项目中的代码一样。这就是为什么我想知道如何使用更复杂的排序方式(需要解析,扫描,正则表达式)来使用“foreach方法中的自定义排序”而不是基于内容属性的天真排序方式。

4 个答案:

答案 0 :(得分:1)

这应该这样做:

array.sort_by do |s| 
  # regex match the digits within the first pair of curly braces
  s.match(/^\{(\d+)\}/)[1].to_i # convert to an int in order to sort
end

# => ["{8}{51} lorem ipsum", "{98} Thx", "{108}{0.8}{908} aa", "{109}{08} OK"]

答案 1 :(得分:1)

array.sort_by { |v| (v =~ /(\d+)/) && $1.to_i }

交替

array.sort_by { |v| /(\d+)/.match(v)[1].to_i }

答案 2 :(得分:1)

[编辑:我的初始解决方案,在此编辑之后,不适用于修订后的问题陈述。但是,我会离开它,因为它可能是有意义的。

以下是根据修订后的规则执行排序的方法,正如我所理解的那样。如果我误解了这些规则,我预计修复将是次要的。

使用正则表达式

让我们从我将使用的正则表达式开始:

R = /
    \{       # match char
    (        # begin capture group
    \d+      # match one or more digits
    (?:      # begin non-capture group
    \.       # match decimal
    \d+      # match one or more digits
    )        # end non-capture group
    |        # or
    \d*      # match zero or more digits
    )        # match end capture group
    \}       # match char
    /x

示例:

a = ["{5}something", "{61}{64}could", "{}be", "{54}{31.24}{0.2}write",
     "{11}{21}{87}{65}here", "[]or", "{31}not", "{31} cat"]
a.each_with_object({}) { |s,h| h[s] = s.scan(R).flatten }
  # => {"{5}something"        =>["5"],
  #    "{61}{64}could"        =>["61", "64"],
  #    "{}be"                 =>[""],
  #    "{54}{31.24}{0.2}write"=>["54", "31.24", "0.2"],
  #    "{11}{21}{87}{65}here" =>["11", "21", "87", "65"],
  #    "[]or"                 =>[],
  #    "{31}not"              =>["31"]
  #    "{31} cat"             =>["31"]} 

custom_sort方法

我们可以按如下方式编写custom_sort方法(将sort_by更改为sort_by!的{​​{1}}:

custom_sort!

<强>实施例

我们试一试:

class Array
  def custom_sort
    sort_by do |s|
      a = s.scan(R).flatten
      raise SyntaxError,
        "'#{s}' contains empty braces" if a.any?(&:empty?)
      raise SyntaxError,
        "'#{s}' contains zero or > 3 pair of braces" if a.size.zero?||a.size > 3
      a.map(&:to_f) << s[a.join.size+2*a.size..-1].tr(' ', 255.chr)
    end
  end
end

a.custom_sort #=> SyntaxError: '{}be' contains empty braces 移除"{}be"

a

删除a = ["{5}something", "{61}{64}could", "{54}{31.24}{0.2}write", "{11}{21}{87}{65}here", "[]or", "{31}not", "{31} cat"] a.custom_sort #SyntaxError: '{11}{21}{87}{65}here' contains > 3 pair of braces

"{11}{21}{87}{65}here"

删除a = ["{5}something", "{61}{64}could", "{54}{31.24}{0.2}write", "[]or", "{31}not", "{31} cat"] a.custom_sort #=> SyntaxError: '[]or' contains zero or > 3 pair of braces

"[]or"

<强>解释

假设要排序的一个字符串是:

a = ["{5}something", "{61}{64}could", "{54}{31.24}{0.2}write",
     "{31}not", "{31} cat"]
a.custom_sort
  #=> ["{5}something",
  #    "{31}not",
  #    "{31} cat",
  #    "{54}{31.24}{0.2}write", "{61}{64}could"] 

然后在s = "{54}{31.24}{0.2}write a letter" 块中,我们将计算:

sort_by

请注意,使用String#tr(或者您可以使用String#gsub)会在ASCII字符的排序顺序末尾添加空格:

a = s.scan(R).flatten
  #=> ["54", "31.24", "0.2"]
raise SyntaxError, "..." if a.any?(&:empty?)
  #=> raise SyntaxError, "..." if false 
raise SyntaxError, "..." if a.size.zero?||a.size > 3
  #=> SyntaxError, "..." if false || false
b = a.map(&:to_f)
  #=> [54.0, 31.24, 0.2] 
t = a.join
  #=> "5431.240.2" 
n = t.size + 2*a.size
  #=> 16 
u = s[n..-1]
  #=> "wr i te" 
v = u.tr(' ', 255.chr)
  #=> "wr\xFFi\xFFte" 
b << v
  #=> [54.0, 31.24, 0.2, "wr\xFFi\xFFte"] 

<强>潮]

我假设在排序中,要以类似Array#<=>的方式比较字符串对。第一个比较考虑每个字符串中第一对括号内的数字串(转换为浮点数后)。通过比较第二对括号(转换为浮点数)中的数字串来打破关系。如果仍然存在平局,则比较大括号中包含的第三对数字等。如果一个字符串具有255.times.all? { |i| i.chr < 255.chr } #=> true 个大括号,另一个字符串具有n个对,并且大括号内的值相同对于第一个m > n对,我假设第一个字符串在排序中位于第二个字符串之前。

<强>代码

n

示例

R = /
    \{    # match char
    (\d+) # capture digits
    \}    # match char
    +     # capture one or more times
    /x

class Array
  def custom_sort!
    sort_by! { |s| s.scan(R).map { |e| e.first.to_f } }
  end
end

<强>解释

现在让我们计算array = ["{109}{08} OK", "{109}{07} OK", "{98} Thx", "{108}{0.8}{908} aa", "{108}{0.8}{907} aa", "{8}{51} lorem ipsum"] a = array.custom_sort! #=> ["{8}{51} lorem ipsum", # "{98} Thx", # "{108}{0.8}{907} aa", # "{108}{0.8}{908} aa", # "{109}{07} OK", # "{109}{08} OK"] array == a #=> true 的第一个元素Array#sort_by!的块中的值

array

现在让我们对其他字符串做同样的事情并将结果放在一个数组中:

s = "{109}{08} OK"

a = s.scan(R)
  #=> [["109"], ["08"]] 
b = a.map { |e| e.first.to_f }
  #=> [109.0, 8.0] 
c = array.map { |s| [s, s.scan(R).map { |e| e.first.to_f }] } #=> [["{8}{51} lorem ipsum", [8.0, 51.0]], # ["{98} Thx", [98.0]], # ["{108}{0.8}{907} aa", [108.0, 907.0]], # ["{108}{0.8}{908} aa", [108.0, 908.0]], # ["{109}{07} OK", [109.0, 7.0]], # ["{109}{08} OK", [109.0, 8.0]]] 中的

sort_by因此等同于:

custom_sort!

答案 3 :(得分:-2)

您可以传递Array#sort一个块来定义它应该如何命令元素。