如何计算角色连续出现的次数

时间:2016-02-06 20:50:17

标签: ruby

我的代码适用于常规字符数

count = Hash.new(0)
str.each_char do |char|
    count[char] += 1 unless char == " "
end
count

例如,"aaabbaaaaacccbbdddd"等于' a' = 8,' b' = 4,' c' = 3,' d' = 4。

我想连续发生多少次。我想要的结果是: '一个' = 3,' b' = 2,' a' = 5' c' = 3,' b' = 2,' d' = 4.我怎么能这样做?

6 个答案:

答案 0 :(得分:7)

"aaabbaaaaacccbbdddd".each_char.chunk(&:itself).map{|k, v| [k, v.length]}
# => [["a", 3], ["b", 2], ["a", 5], ["c", 3], ["b", 2], ["d", 4]]

我对sawa和spickermann的解决方案进行了基准测试:

require 'benchmark/ips'

def sawa(string)
  string.each_char.chunk(&:itself).map{|k, v| [k, v.length] }
end

def spickermann(string)
  string.split(//).slice_when { |a, b| a != b }.map { |group| [group.first, group.size] }
end

Benchmark.ips do |x|
  string = "aaabbaaaaacccbbdddd"

  x.report("sawa") { sawa string }
  x.report("spickerman") { spickermann string }

  x.compare!
end

# Calculating -------------------------------------
#                 sawa     6.293k i/100ms
#          spickermann     4.447k i/100ms
# -------------------------------------------------
#                 sawa     75.353k (±10.4%) i/s -    371.287k
#          spickermann     48.661k (±12.0%) i/s -    240.138k
# 
# Comparison:
#                 sawa:    75353.5 i/s
#          spickermann:    48660.7 i/s - 1.55x slower

答案 1 :(得分:4)

怎么样:

import socket
import os
import thread

s = socket.socket()
host = socket.gethostname()
port = 9000
s.connect((host, port))
path = "blah"
directory = os.listdir(path)
for files in directory:
    print files
    filename = files
    size = len(filename)
    size = bin(size)[2:].zfill(16) # encode filename size as 16 bit binary
    s.send(size)
    s.send(filename)

    filename = os.path.join(path,filename)
    filesize = os.path.getsize(filename)
    filesize = bin(filesize)[2:].zfill(32) # encode filesize as 32 bit binary
    s.send(filesize)

    file_to_send = open(filename, 'rb')

    l = file_to_send.read()
    s.sendall(l)
    file_to_send.close()
    print 'File Sent'

s.close()

答案 2 :(得分:2)

使用数组来存储对,而不是散列。

str = "aaabbaaaaacccbbdddd"

counts = []
str.each_char do |char|
  # Get the last seen character and count pair
  last_pair = counts[-1] || []

  if last_pair[0] == char
    # This character is the same as the last one, increment its count
    last_pair[1] += 1
  else
    # New character, push a new pair onto the list
    counts.push([char, 1])
  end

end

counts.each { |c|
  puts "#{c[0]} = #{c[1]}"
}

使用chunk可以更简洁地编写。

str = "aaabbaaaaacccbbdddd"
counts = []
str.chars.chunk(&:itself).each { |char, chars|
  counts << [char, chars.length]
}
puts counts.inspect

chunk将列表拆分为块。它通过调用每个元素上的块来决定这一点。只要块返回与前一个值相同的值,它就会添加到当前块。一旦它改变,它就会成为一个新的块。这类似于我们之前通过存储最后看到的字符在循环中所做的事情。

  if last_seen == char
    # it's the same chunk
  else
    # it's a new chunk
    last_seen = char
  end

itself返回角色。因此chunk(&:itself)会将字符串拆分为多个字符块。

新列表是chunk(&:itself)的返回值(在我们的例子中是此块中的字符)加上实际的块(例如字符串“aaa”)。

答案 3 :(得分:1)

我更喜欢这类问题的正则表达式:

str = "aaabbaaaaacccbbdddd"
counts = str.scan(/(?<seq>(?<char>\w)\k<char>+)/).inject([]) do |occurs, match|
  occurs << [match[1], match[0].size]

  occurs
end
puts counts.inspect #=>[["a", 3], ["b", 2], ["a", 5], ["c", 3], ["b", 2], ["d", 4]]

修改

我使用@sawa运行相同的基准测试,并添加了正则表达式方式。看起来好一点。此外,#itself不适用于ruby < 2.2.x

require 'benchmark/ips'

def sawa(string)
  string.each_char.chunk(&:itself).map{|k, v| [k, v.length] }
end

def spickermann(string)
  string.split(//).slice_when { |a, b| a != b }.map { |group| [group.first, group.size] }
end

def stathopa(string)
  string.scan(/(?<seq>(?<char>\w)\k<char>+)/).inject([]) do |occurs, match|
    occurs << [match[1], match[0].size]

    occurs
  end
end

Benchmark.ips do |x|
  string = "aaabbaaaaacccbbdddd"

  x.report("sawa") { sawa string }
  x.report("spickerman") { spickermann string }
  x.report("stathopa") { stathopa string }

  x.compare!
end

# Calculating -------------------------------------
#                 sawa     6.730k i/100ms
#           spickerman     4.061k i/100ms
#             stathopa    11.969k i/100ms
# -------------------------------------------------
#                 sawa     70.072k (± 8.9%) i/s -    349.960k
#           spickerman     43.652k (± 9.5%) i/s -    219.294k
#             stathopa    132.992k (± 8.8%) i/s -    670.264k
# 
# Comparison:
#             stathopa:   132992.1 i/s
#                 sawa:    70072.4 i/s - 1.90x slower
#           spickerman:    43651.6 i/s - 3.05x slower
# 

答案 4 :(得分:0)

计算每个字符的最大长度序列:

count = Hash.new(0)
last_char = nil
occurred = 0
str.each_char do |char|
    if char != last_char
      occurred = 1
    else
      occurred += 1
    end
    last_char = char
    count[char] = occurred if (count[char]||0) < occurred
end
count

或者得到像[['a',3],['b',2],['a',5],['c',3],['b',2]的结果, [ 'd',4]]:

count = []
last_char = nil
occurred = 0
str.each_char do |char|
    if char != last_char
      count.push([last_char, occurred])
      occurred = 1
    else
      occurred += 1
    end
    last_char = char
end
count.push([last_char, occurred])
count

答案 5 :(得分:0)

以下是一种方法:

s = "aaabbaaaaacccbbdddd"
s.chars.uniq.map do |c|
  p [c, s.split(/[^#{c}]+/).reject(&:empty?).map(&:size)]
end.to_h
#=> {"a"=>[3, 5], "b"=>[2, 2], "c"=>[3], "d"=>[4]}