我的代码适用于常规字符数
count = Hash.new(0)
str.each_char do |char|
count[char] += 1 unless char == " "
end
count
例如,"aaabbaaaaacccbbdddd"
等于' a' = 8,' b' = 4,' c' = 3,' d' = 4。
我想连续发生多少次。我想要的结果是: '一个' = 3,' b' = 2,' a' = 5' c' = 3,' b' = 2,' d' = 4.我怎么能这样做?
答案 0 :(得分:7)
"aaabbaaaaacccbbdddd".each_char.chunk(&:itself).map{|k, v| [k, v.length]}
# => [["a", 3], ["b", 2], ["a", 5], ["c", 3], ["b", 2], ["d", 4]]
我对sawa和spickermann的解决方案进行了基准测试:
require 'benchmark/ips'
def sawa(string)
string.each_char.chunk(&:itself).map{|k, v| [k, v.length] }
end
def spickermann(string)
string.split(//).slice_when { |a, b| a != b }.map { |group| [group.first, group.size] }
end
Benchmark.ips do |x|
string = "aaabbaaaaacccbbdddd"
x.report("sawa") { sawa string }
x.report("spickerman") { spickermann string }
x.compare!
end
# Calculating -------------------------------------
# sawa 6.293k i/100ms
# spickermann 4.447k i/100ms
# -------------------------------------------------
# sawa 75.353k (±10.4%) i/s - 371.287k
# spickermann 48.661k (±12.0%) i/s - 240.138k
#
# Comparison:
# sawa: 75353.5 i/s
# spickermann: 48660.7 i/s - 1.55x slower
答案 1 :(得分:4)
怎么样:
import socket
import os
import thread
s = socket.socket()
host = socket.gethostname()
port = 9000
s.connect((host, port))
path = "blah"
directory = os.listdir(path)
for files in directory:
print files
filename = files
size = len(filename)
size = bin(size)[2:].zfill(16) # encode filename size as 16 bit binary
s.send(size)
s.send(filename)
filename = os.path.join(path,filename)
filesize = os.path.getsize(filename)
filesize = bin(filesize)[2:].zfill(32) # encode filesize as 32 bit binary
s.send(filesize)
file_to_send = open(filename, 'rb')
l = file_to_send.read()
s.sendall(l)
file_to_send.close()
print 'File Sent'
s.close()
答案 2 :(得分:2)
使用数组来存储对,而不是散列。
str = "aaabbaaaaacccbbdddd"
counts = []
str.each_char do |char|
# Get the last seen character and count pair
last_pair = counts[-1] || []
if last_pair[0] == char
# This character is the same as the last one, increment its count
last_pair[1] += 1
else
# New character, push a new pair onto the list
counts.push([char, 1])
end
end
counts.each { |c|
puts "#{c[0]} = #{c[1]}"
}
使用chunk可以更简洁地编写。
str = "aaabbaaaaacccbbdddd"
counts = []
str.chars.chunk(&:itself).each { |char, chars|
counts << [char, chars.length]
}
puts counts.inspect
chunk
将列表拆分为块。它通过调用每个元素上的块来决定这一点。只要块返回与前一个值相同的值,它就会添加到当前块。一旦它改变,它就会成为一个新的块。这类似于我们之前通过存储最后看到的字符在循环中所做的事情。
if last_seen == char
# it's the same chunk
else
# it's a new chunk
last_seen = char
end
itself
返回角色。因此chunk(&:itself)
会将字符串拆分为多个字符块。
新列表是chunk(&:itself)
的返回值(在我们的例子中是此块中的字符)加上实际的块(例如字符串“aaa”)。
答案 3 :(得分:1)
我更喜欢这类问题的正则表达式:
str = "aaabbaaaaacccbbdddd"
counts = str.scan(/(?<seq>(?<char>\w)\k<char>+)/).inject([]) do |occurs, match|
occurs << [match[1], match[0].size]
occurs
end
puts counts.inspect #=>[["a", 3], ["b", 2], ["a", 5], ["c", 3], ["b", 2], ["d", 4]]
修改强>
我使用@sawa运行相同的基准测试,并添加了正则表达式方式。看起来好一点。此外,#itself
不适用于ruby < 2.2.x
require 'benchmark/ips'
def sawa(string)
string.each_char.chunk(&:itself).map{|k, v| [k, v.length] }
end
def spickermann(string)
string.split(//).slice_when { |a, b| a != b }.map { |group| [group.first, group.size] }
end
def stathopa(string)
string.scan(/(?<seq>(?<char>\w)\k<char>+)/).inject([]) do |occurs, match|
occurs << [match[1], match[0].size]
occurs
end
end
Benchmark.ips do |x|
string = "aaabbaaaaacccbbdddd"
x.report("sawa") { sawa string }
x.report("spickerman") { spickermann string }
x.report("stathopa") { stathopa string }
x.compare!
end
# Calculating -------------------------------------
# sawa 6.730k i/100ms
# spickerman 4.061k i/100ms
# stathopa 11.969k i/100ms
# -------------------------------------------------
# sawa 70.072k (± 8.9%) i/s - 349.960k
# spickerman 43.652k (± 9.5%) i/s - 219.294k
# stathopa 132.992k (± 8.8%) i/s - 670.264k
#
# Comparison:
# stathopa: 132992.1 i/s
# sawa: 70072.4 i/s - 1.90x slower
# spickerman: 43651.6 i/s - 3.05x slower
#
答案 4 :(得分:0)
计算每个字符的最大长度序列:
count = Hash.new(0)
last_char = nil
occurred = 0
str.each_char do |char|
if char != last_char
occurred = 1
else
occurred += 1
end
last_char = char
count[char] = occurred if (count[char]||0) < occurred
end
count
或者得到像[['a',3],['b',2],['a',5],['c',3],['b',2]的结果, [ 'd',4]]:
count = []
last_char = nil
occurred = 0
str.each_char do |char|
if char != last_char
count.push([last_char, occurred])
occurred = 1
else
occurred += 1
end
last_char = char
end
count.push([last_char, occurred])
count
答案 5 :(得分:0)
以下是一种方法:
s = "aaabbaaaaacccbbdddd"
s.chars.uniq.map do |c|
p [c, s.split(/[^#{c}]+/).reject(&:empty?).map(&:size)]
end.to_h
#=> {"a"=>[3, 5], "b"=>[2, 2], "c"=>[3], "d"=>[4]}