如果给定逗号分隔的字符串和另一个指示类型的数组,如何构造不同类型的数组?
通过解析从stdin
获取的CSV输入,我有一个列标题Symbol
的数组:
cols = [:IndexSymbol, :PriceStatus, :UpdateExchange, :Last]
和一行原始输入:
raw = "$JX.T.CA,Open,T,933.36T 11:10:00.000"
我想从cells
输入构造一个数组raw
,其中cells
的每个元素都是cols
中相应元素标识的类型。什么是惯用的Ruby-sh方式呢?
我试过这个,虽然有效但感觉不对。
1)首先,为每个需要封装的类型定义一个类:
class Sku
attr_accessor :mRoot, :mExch,, :mCountry
def initialize(root, exch, country)
@mRoot = root
@mExch = exch
@mCountry = country
end
end
class Price
attr_accessor :mPrice, :mExchange, :mTime
def initialize(price, exchange, time)
@mPrice = price
@mExchange = exchange
@mTime = time
end
end
2)然后,为需要转换的每个唯一列类型定义转换函数:
def to_sku(raw)
raw.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| Sku.new(m[1], m[2], m[3])}
end
def to_price(raw)
end
3)从输入中创建一个字符串数组:
cells = raw.split(",")
4)最后通过构造由相应列标题指定的类型来就地修改cells
的每个元素:
cells.each_index do |i|
cells[i] = case cols[i]
when :IndexSymbol
to_sku(cells[i])
when :PriceStatus
cells[i].split(";").collect {|st| st.to_sym}
when :UpdateExchange
cells[i]
when :Last
cells[i].match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
else
puts "Unhandled column type (#{cols[i]}) from input string: \n#{cols}\n#{raw}"
exit -1
end
end
感觉不对的部分是步骤3和4.如何以更Ruby的方式完成?我想象的是某种超级简洁的方法,只存在于我的想象中:
cells = raw.split_using_convertor(",")
答案 0 :(得分:2)
您可以使用#zip
,#map
和destructuring assignment简化第四步:
cells = cells.zip(cols).map do |cell, col|
case col
when :IndexSymbol
to_sku(cell)
when :PriceStatus
cell.split(";").collect {|st| st.to_sym}
when :UpdateExchange
cell
when :Last
cell.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
else
puts "Unhandled column type (#{col}) from input string: \n#{cols}\n#{raw}"
exit -1
end
end
我不建议将该步骤与拆分相结合,因为解析一行CSV很复杂,足以成为它自己的步骤。有关如何解析CSV的信息,请参阅my comment。
答案 1 :(得分:2)
您可以从基类继承不同的类型,并将查找知识放在该基类中。然后你可以让每个类都知道如何从原始字符串初始化自己:
class Header
@@lookup = {}
def self.symbol(*syms)
syms.each{|sym| @@lookup[sym] = self}
end
def self.lookup(sym)
@@lookup[sym]
end
end
class Sku < Header
symbol :IndexSymbol
attr_accessor :mRoot, :mExch, :mCountry
def initialize(root, exch, country)
@mRoot = root
@mExch = exch
@mCountry = country
end
def to_s
"@#{mRoot}-#{mExch}-#{mCountry}"
end
def self.from_raw(str)
str.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| new(m[1], m[2], m[3])}
end
end
class Price < Header
symbol :Last, :Bid
attr_accessor :mPrice, :mExchange, :mTime
def initialize(price, exchange, time)
@mPrice = price
@mExchange = exchange
@mTime = Time.new(time)
end
def to_s
"$#{mPrice}-#{mExchange}-#{mTime}"
end
def self.from_raw(raw)
raw.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| new(m[1], m[2], m[3])}
end
end
class SymbolList
symbol :PriceStatus
attr_accessor :mSymbols
def initialize(symbols)
@mSymbols = symbols
end
def self.from_raw(str)
new(str.split(";").map(&:to_sym))
end
def to_s
mSymbols.to_s
end
end
class ExchangeIdentifier
symbol :UpdateExchange
attr_accessor :mExch
def initialize(exch)
@mExch = exch
end
def self.from_raw(raw)
new(raw)
end
def to_s
mExch
end
end
然后你可以像这样替换步骤#4(不包括CSV解析):
cells.each_index.map do |i|
Header.lookup(cols[i]).from_raw(cells[i])
end
答案 2 :(得分:1)
Ruby的CSV library直接支持这种事情(以及更好地处理实际的解析),尽管文档有点尴尬。
您需要提供一个proc
来为您进行转化,并将其作为选项传递给CSV.parse
:
converter = proc do |field, info|
case info.header.strip # in case you have spaces after your commas
when "IndexSymbol"
field.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| Sku.new(m[1], m[2], m[3])}
when "PriceStatus"
field.split(";").collect {|st| st.to_sym}
when "UpdateExchange"
field
when "Last"
field.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
end
end
然后你几乎可以直接解析成你想要的格式:
c = CSV.parse(s, :headers => true, :converters => converter).by_row!.map do |row|
row.map { |_, field| f } #we only want the field now, not the header
end
答案 3 :(得分:1)
@ AbeVoelker的答案引导我朝着正确的方向前进,但由于我在OP中未提及的事情,我不得不做出一个非常重大的改变。
某些单元格属于同一类型,但仍然具有不同的语义。这些语义差异不会在这里发挥(并且没有详细说明),但是它们在我正在编写的工具的更大范围内进行。
例如,将有几个类型为Price
的单元格;其中一些是:Last
,':Bid
和:Ask
。它们都是相同的类型(Price
),但它们仍然不同,因此所有Header@@lookup
列都不能有一个Price
条目。
所以我实际上做的是为每个类型的单元格编写一个自解码类(对于这个关键部分来说属于Abe):
class Sku
attr_accessor :mRoot, :mExch, :mCountry
def initialize(root, exch, country)
@mRoot = root
@mExch = exch
@mCountry = country
end
def to_s
"@#{mRoot}-#{mExch}-#{mCountry}"
end
def self.from_raw(str)
str.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| new(m[1], m[2], m[3])}
end
end
class Price
attr_accessor :mPrice, :mExchange, :mTime
def initialize(price, exchange, time)
@mPrice = price
@mExchange = exchange
@mTime = Time.new(time)
end
def to_s
"$#{mPrice}-#{mExchange}-#{mTime}"
end
def self.from_raw(raw)
raw.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| new(m[1], m[2], m[3])}
end
end
class SymbolList
attr_accessor :mSymbols
def initialize(symbols)
@mSymbols = symbols
end
def self.from_raw(str)
new(str.split(";").collect {|s| s.to_sym})
end
def to_s
mSymbols.to_s
end
end
class ExchangeIdentifier
attr_accessor :mExch
def initialize(exch)
@mExch = exch
end
def self.from_raw(raw)
new(raw)
end
def to_s
mExch
end
end
...创建一个类型列表,将每个列标识符映射到类型:
ColumnTypes =
{
:IndexSymbol => Sku,
:PriceStatus => SymbolList,
:UpdateExchange => ExchangeIdentifier,
:Last => Price,
:Bid => Price
}
...最后通过调用相应类型的Array
构建我的from_raw
个单元格:
cells = raw.split(",").each_with_index.collect { |cell,i|
puts "Cell: #{cell}, ColType: #{ColumnTypes[cols[i]]}"
ColumnTypes[cols[i]].from_raw(cell)
}
结果是代码在我的眼中是干净的和富有表现力的,而且看起来更像是我最初做过的Ruby。
完整示例here。