基于其他数组中的元素构造不同类型的数组

时间:2013-08-19 14:34:03

标签: ruby

如果给定逗号分隔的字符串和另一个指示类型的数组,如何构造不同类型的数组?


通过解析从stdin获取的CSV输入,我有一个列标题Symbol的数组:

cols = [:IndexSymbol, :PriceStatus, :UpdateExchange, :Last]

和一行原始输入:

raw = "$JX.T.CA,Open,T,933.36T 11:10:00.000"

我想从cells输入构造一个数组raw,其中cells的每个元素都是cols中相应元素标识的类型。什么是惯用的Ruby-sh方式呢?


我试过这个,虽然有效但感觉不对。

1)首先,为每个需要封装的类型定义一个类:

class Sku
  attr_accessor :mRoot, :mExch,, :mCountry
  def initialize(root, exch, country)
    @mRoot = root
    @mExch = exch
    @mCountry = country
  end
end

class Price
  attr_accessor :mPrice, :mExchange, :mTime
  def initialize(price, exchange, time)
    @mPrice = price
    @mExchange = exchange
    @mTime = time
  end
end

2)然后,为需要转换的每个唯一列类型定义转换函数:

def to_sku(raw)
  raw.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| Sku.new(m[1], m[2], m[3])}
end

def to_price(raw)

end

3)从输入中创建一个字符串数组:

cells = raw.split(",")

4)最后通过构造由相应列标题指定的类型来就地修改cells的每个元素:

cells.each_index do |i|
    cells[i] = case cols[i]
        when :IndexSymbol
            to_sku(cells[i])
        when :PriceStatus
            cells[i].split(";").collect {|st| st.to_sym}
        when :UpdateExchange
            cells[i]
        when :Last
            cells[i].match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
        else
            puts "Unhandled column type (#{cols[i]}) from input string: \n#{cols}\n#{raw}"
            exit -1
    end
end

感觉不对的部分是步骤3和4.如何以更Ruby的方式完成?我想象的是某种超级简洁的方法,只存在于我的想象中:

cells = raw.split_using_convertor(",")

4 个答案:

答案 0 :(得分:2)

您可以使用#zip#mapdestructuring assignment简化第四步:

cells = cells.zip(cols).map do |cell, col|
    case col
    when :IndexSymbol
        to_sku(cell)
    when :PriceStatus
        cell.split(";").collect {|st| st.to_sym}
    when :UpdateExchange
        cell
    when :Last
        cell.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
    else
        puts "Unhandled column type (#{col}) from input string: \n#{cols}\n#{raw}"
        exit -1
    end
end

我不建议将该步骤与拆分相结合,因为解析一行CSV很复杂,足以成为它自己的步骤。有关如何解析CSV的信息,请参阅my comment

答案 1 :(得分:2)

您可以从基类继承不同的类型,并将查找知识放在该基类中。然后你可以让每个类都知道如何从原始字符串初始化自己:

class Header
  @@lookup = {}

  def self.symbol(*syms)
    syms.each{|sym| @@lookup[sym] = self}
  end

  def self.lookup(sym)
    @@lookup[sym]
  end
end

class Sku < Header
  symbol :IndexSymbol
  attr_accessor :mRoot, :mExch, :mCountry

  def initialize(root, exch, country)
    @mRoot = root
    @mExch = exch
    @mCountry = country
  end

  def to_s
    "@#{mRoot}-#{mExch}-#{mCountry}"
  end

  def self.from_raw(str)
    str.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| new(m[1], m[2], m[3])}
  end
end

class Price < Header
  symbol :Last, :Bid
  attr_accessor :mPrice, :mExchange, :mTime

  def initialize(price, exchange, time)
    @mPrice = price
    @mExchange = exchange
    @mTime = Time.new(time)
  end

  def to_s
    "$#{mPrice}-#{mExchange}-#{mTime}"
  end

  def self.from_raw(raw)
    raw.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| new(m[1], m[2], m[3])}
  end
end

class SymbolList
  symbol :PriceStatus
  attr_accessor :mSymbols

  def initialize(symbols)
    @mSymbols = symbols
  end

  def self.from_raw(str)
    new(str.split(";").map(&:to_sym))
  end

  def to_s
    mSymbols.to_s
  end
end

class ExchangeIdentifier
  symbol :UpdateExchange
  attr_accessor :mExch

  def initialize(exch)
    @mExch = exch
  end

  def self.from_raw(raw)
    new(raw)
  end

  def to_s
    mExch
  end
end

然后你可以像这样替换步骤#4(不包括CSV解析):

cells.each_index.map do |i|
  Header.lookup(cols[i]).from_raw(cells[i])
end

答案 2 :(得分:1)

Ruby的CSV library直接支持这种事情(以及更好地处理实际的解析),尽管文档有点尴尬。

您需要提供一个proc来为您进行转化,并将其作为选项传递给CSV.parse

converter = proc do |field, info|
  case info.header.strip # in case you have spaces after your commas
  when "IndexSymbol"
      field.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| Sku.new(m[1], m[2], m[3])}
  when "PriceStatus"
      field.split(";").collect {|st| st.to_sym}
  when "UpdateExchange"
      field
  when "Last"
      field.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
  end
end

然后你几乎可以直接解析成你想要的格式:

c =  CSV.parse(s, :headers => true, :converters => converter).by_row!.map do |row|
  row.map { |_, field| f }  #we only want the field now, not the header
end

答案 3 :(得分:1)

@ AbeVoelker的答案引导我朝着正确的方向前进,但由于我在OP中未提及的事情,我不得不做出一个非常重大的改变。

某些单元格属于同一类型,但仍然具有不同的语义。这些语义差异不会在这里发挥(并且没有详细说明),但是它们在我正在编写的工具的更大范围内进行。

例如,将有几个类型为Price的单元格;其中一些是:Last':Bid:Ask。它们都是相同的类型(Price),但它们仍然不同,因此所有Header@@lookup列都不能有一个Price条目。

所以我实际上做的是为每个类型的单元格编写一个自解码类(对于这个关键部分来说属于Abe):

class Sku
    attr_accessor :mRoot, :mExch, :mCountry
    def initialize(root, exch, country)
        @mRoot = root
        @mExch = exch
        @mCountry = country
    end

    def to_s
        "@#{mRoot}-#{mExch}-#{mCountry}"
    end

    def self.from_raw(str)
        str.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| new(m[1], m[2], m[3])}
    end
end

class Price
    attr_accessor :mPrice, :mExchange, :mTime
    def initialize(price, exchange, time)
        @mPrice = price
        @mExchange = exchange
        @mTime = Time.new(time)
    end
    def to_s
        "$#{mPrice}-#{mExchange}-#{mTime}"
    end
    def self.from_raw(raw)
        raw.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| new(m[1], m[2], m[3])}
    end
end

class SymbolList
    attr_accessor :mSymbols
    def initialize(symbols)
        @mSymbols = symbols
    end
    def self.from_raw(str)
        new(str.split(";").collect {|s| s.to_sym})
    end
    def to_s
        mSymbols.to_s
    end
end

class ExchangeIdentifier
    attr_accessor :mExch
    def initialize(exch)
        @mExch = exch
    end
    def self.from_raw(raw)
        new(raw)
    end
    def to_s
        mExch
    end
end

...创建一个类型列表,将每个列标识符映射到类型:

ColumnTypes =
{
    :IndexSymbol => Sku,
    :PriceStatus => SymbolList,
    :UpdateExchange => ExchangeIdentifier,
    :Last => Price,
    :Bid => Price
}

...最后通过调用相应类型的Array构建我的from_raw个单元格:

cells = raw.split(",").each_with_index.collect { |cell,i|
    puts "Cell: #{cell}, ColType: #{ColumnTypes[cols[i]]}"
    ColumnTypes[cols[i]].from_raw(cell)
}

结果是代码在我的眼中是干净的和富有表现力的,而且看起来更像是我最初做过的Ruby。

完整示例here