ruby字母数字排序不按预期工作

时间:2016-08-18 17:07:11

标签: arrays ruby sorting alphanumeric

给出以下数组:

y = %w[A1 A2 B5 B12 A6 A8 B10 B3 B4 B8]
=> ["A1", "A2", "B5", "B12", "A6", "A8", "B10", "B3", "B4", "B8"]

预期的排序数组为:

=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]

使用以下(vanilla)排序,我得到:

irb(main):2557:0> y.sort{|a,b| puts "%s <=> %s = %s\n" % [a, b, a <=> b]; a <=> b}
A1 <=> A8 = -1
A8 <=> B8 = -1
A2 <=> A8 = -1
B5 <=> A8 = 1
B4 <=> A8 = 1
B3 <=> A8 = 1
B10 <=> A8 = 1
B12 <=> A8 = 1
A6 <=> A8 = -1
A1 <=> A2 = -1
A2 <=> A6 = -1
B12 <=> B3 = -1
B3 <=> B8 = -1
B5 <=> B3 = 1
B4 <=> B3 = 1
B10 <=> B3 = -1  # this appears to be wrong, looks like 1 is being compared, not 10.
B12 <=> B10 = 1
B5 <=> B4 = 1
B4 <=> B8 = -1
B5 <=> B8 = -1
=> ["A1", "A2", "A6", "A8", "B10", "B12", "B3", "B4", "B5", "B8"]

......这显然不是我想要的。我知道我可以首先尝试拆分alpha然后对数字进行排序,但似乎我不应该这样做。

可能有一点需要注意:我们现在使用Ruby 1.8.7 :(但即使Ruby 2.0.0也在做同样的事情。我在这里缺少什么?

建议?

4 个答案:

答案 0 :(得分:2)

您正在排序字符串。字符串按字符串排序,而不是数字。如果你想对数字进行排序,那么你应该对数字进行排序,而不是字符串。字符串'B10'在字典上比字符串'B3'小,这不是Ruby特有的东西,甚至不是编程所特有的东西,这就是在编程中,字典排序文本几乎无处不在的地方。数据库,词典,词典,电话簿等

您应该将字符串拆分为数字和非数字组件,并将数字组件转换为数字。数组排序是字典式的,因此最终将完​​全正确排序:

y.sort_by {|s| # use `sort_by` for a keyed sort, not `sort`
  s.
    split(/(\d+)/). # split numeric parts from non-numeric
    map {|s| # the below parses numeric parts as decimals, ignores the rest
      begin Integer(s, 10); rescue ArgumentError; s end }}
#=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]

答案 1 :(得分:0)

如果您知道数字中最大位数是多少,也可以在比较过程中以0为前缀。

y.sort_by { |string| string.gsub(/\d+/) { |digits| format('%02d', digits.to_i) } }
#=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]

此处'%02d'指定以下内容,%表示值的格式,然后0指定给数字加上0 s,即{{1 }}指定数字的总长度,2指定要以小数(以10为底)的输出。您可以找到其他信息here

这意味着d将转换为'A1''A01'将变为'B8',而'B08'将保持'B12',因为它已经有2位数字。仅在比较时使用。

答案 2 :(得分:0)

有两种方法可以做到这一点。

arr = ["A1", "A2", "B5", "B12", "A6", "AB12", "A8", "B10", "B3", "B4",
       "B8", "AB2"]

按2个元素的数组排序

arr.sort_by { |s| [s[/\D+/], s[/\d+/].to_i] }
  #=> ["A1", "A2", "A6", "A8", "AB2", "AB12", "B3", "B4", "B5", "B8",
  #    "B10", "B12"] 

这类似于@Jorg的解决方案,除了我已经分别计算了比较数组的两个元素,而不是将字符串分成两部分并将后者转换为整数。

Enumerable#sort_byarr的每对元素与宇宙飞船方法<=>进行比较。由于要比较的元素是数组,因此使用方法Array#<=>。尤其请参阅该文档的第三段。

sort_by比较以下2个元素的数组:

arr.each { |s| puts "%s-> [%s, %d]" %
  ["\"#{s}\"".ljust(7), "\"#{s[/\D+/]}\"".ljust(4), s[/\d+/].to_i] }

"A1"   -> ["A" , 1]
"A2"   -> ["A" , 2]
"B5"   -> ["B" , 5]
"B12"  -> ["B" , 12]
"A6"   -> ["A" , 6]
"AB12" -> ["AB", 12]
"A8"   -> ["A" , 8]
"B10"  -> ["B" , 10]
"B3"   -> ["B" , 3]
"B4"   -> ["B" , 4]
"B8"   -> ["B" , 8]
"AB2"  -> ["AB", 2]

在字符串的字母数字部分和数字部分之间插入空格

max_len = arr.max_by(&:size).size
  #=> 4
arr.sort_by { |s| "%s%s%d" % [s[/\D+/], " "*(max_len-s.size), s[/\d+/].to_i] }
  #=> ["A1", "A2", "A6", "A8", "AB2", "AB12", "B3", "B4", "B5", "B8",
  #    "B10", "B12"]

这里sort_by比较以下字符串:

arr.each { |s| puts "%s-> \"%s\"" %
  ["\"#{s}\"".ljust(7), s[/\D+/] + " "*(max_len-s.size) + s[/\d+/]] }

"A1"   -> "A  1"
"A2"   -> "A  2"
"B5"   -> "B  5"
"B12"  -> "B 12"
"A6"   -> "A  6"
"AB12" -> "AB12"
"A8"   -> "A  8"
"B10"  -> "B 10"
"B3"   -> "B  3"
"B4"   -> "B  4"
"B8,"  -> "B 8"
"AB2"  -> "AB 2"

答案 3 :(得分:-1)

需要自然或词典排序,而不是基于字符值的标准排序。像这些宝石之类的东西将是一个起点:https://github.com/dogweather/naturallyhttps://github.com/johnnyshields/naturalsort

人类对待像#&#34; A2&#34; as&#34; A&#34;后跟数字2,并使用字符串排序为字符串部分和数字部分的数字排序进行排序。标准sort()使用字符值排序将字符串视为字符序列,而不管字符是什么。因此对于sort()&#34; A10&#34;和&#34; A2&#34;看起来像[&#39; A&#39;,&#39; 1&#39;,&#39; 0&#39; ]和[&#39; A&#39;,&#39; 2&#39; ],因为&#39; 1&#39;之前排序&#39; 2&#39;并且以下字符不能改变该顺序&#34; A10&#34;因此在&#34; A2&#34;之前排序。对于人类而言,相同的字符串看起来像[&#34; A&#34;,10]和[&#34; A&#34;,2],在2之后排序10,因此我们得到相反的结果。可以操作字符串以使基于字符值的sort()通过使数字部分固定宽度并在左边填充零来产生预期结果以避免嵌入空格,从而使#34; A2&# 34;变成&#34; A02&#34;在&#34; A10&#34;之前排序使用标准sort()