Rails规范化csv文件数据

时间:2016-06-02 00:57:01

标签: ruby-on-rails ruby csv database-normalization

我尝试将tsv(制表符分隔数据)文件导入我的数据库,但它没有正确格式化。列pricecount仅由空格分隔(标题行除外),并且值都放入price键,将所有数据移入错误键值对。

tsv文件:

purchaser name  item description    price   count   merchant address    merchant name
Alice Bob   $10 off $20 of food 10.0 2   987 Fake St     Bob's Pizza
Example Name    $30 of awesome for $10  10.0 5   456 Unreal Rd   Tom's Awesome Shop
Name Three  $20 Sneakers for $5 5.0    1     123 Fake St     Sneaker Store Emporium
John Williams   $20 Sneakers for $5 5.0    4     123 Fake St     Sneaker Store Emporium 
/models/purchase.rb中的

class Purchase < ActiveRecord::Base
  # validates :item_price, :numericality => { :greater_than_or_equal_to => 0 }

  def self.import(file)
    CSV.foreach(file.path, :headers => true,
                       :header_converters => lambda { |h| h.downcase.gsub(' ', '_')},
                       :col_sep => "\t"
                       ) do |row|
                      # debugger
                      purchase_hash = row.to_hash
      Purchase.create!(purchase_hash)
    end
  end
end

如果我在模型中的调试器中导入文件和注释,然后键入row,则返回:

#<CSV::Row "purchaser_name":"Alice Bob" "item_description":"$10 off $20 of food" "price":"10.0 2" "count":" 987 Fake St" "merchant_address":" Bob's Pizza" "merchant_name":nil>

row.inspect返回:

"#<CSV::Row \"purchaser_name\":\"Alice Bob\" \"item_description\":\"$10 off $20 of food\" \"price\":\"10.0 2\" \"count\":\" 987 Fake St\" \"merchant_address\":\" Bob's Pizza\" \"merchant_name\":nil>"

正如您所看到的,price(10.0)和count(2)被压缩成相同的值,因为它们在文件中没有制表符分隔符。

db/schema.rb

ActiveRecord::Schema.define(version: 20160601205154) do

  create_table "purchases", force: :cascade do |t|
    t.string   "purchaser_name"
    t.string   "item_description"
    t.string   "price"
    t.string   "count"
    t.string   "merchant_address"
    t.string   "merchant_name"
    t.datetime "created_at",       null: false
    t.datetime "updated_at",       null: false
  end

end

我最初将price作为Decimal数据类型,将count作为Integer,但将它们切换回String以尝试查找解决方案。如果它有帮助我可以改回来(如果可能的话,我宁愿改回来)

2 个答案:

答案 0 :(得分:1)

对此的解决方案是双重的。首先,定义一个转换器,它将在解析过程中将字段拆分为两部分(并将其转换为进程中的数字):

CONVERTER_SPLIT_PRICE_COUNT = lambda do |value, info|
  next value unless info.header == "price"
  price, count = value.split
  [ price.to_f, count.to_i ]
end

这会将price字段转换为数组,例如"10.0 2"变为[10.0, 2]

其次,定义一个方法,在解析之后,将修复错位的值并返回正确的哈希:

def row_to_hash_fixing_price_count(row)
  row.headers.zip(row.fields.flatten).to_h
end

上面将price / count数组展平为其父数组(行的其余部分),然后使用headers数组将其拉上。由于现在有多个字段而不是标题,因此最后会删除额外的nil

你会像这样使用它们:

csv_opts = {
  headers: true,
  col_sep: "\t",
  header_converters: ->(h) { h.downcase.tr(" ", "_") },
  converters: CONVERTER_SPLIT_PRICE_COUNT
}

data_out = CSV.new(data, csv_opts).map do |row|
  row_to_hash_fixing_price_count(row)
end
# => [ { "purchaser_name" => "Alice Bob",
#        "item_description" => "$10 off $20 of food",
#        "price" => 10.0,
#        "count" => 2,
#        "merchant_address" => "987 Fake St",
#        "merchant_name" => "Bob's Pizza"
#      },
#      # ...
#    ]

您可以在此处看到它:http://ideone.com/08wTPT

P.S。考虑批量创建记录而不是一次创建一个记录。鉴于上述情况,您可以Purchase.create!(data_out)进行 <?php $ch = curl_init(); curl_setopt($ch, CURLOPT_TIMEOUT, 10); curl_setopt($ch, CURLOPT_URL, 'https://api.mch.weixin.qq.com'); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE); curl_setopt($ch, CURLOPT_HEADER, FALSE); curl_setopt($ch, CURLOPT_VERBOSE, 1); $verbose = fopen('php://temp', 'w+'); curl_setopt($ch, CURLOPT_STDERR, $verbose); curl_setopt($ch, CURLOPT_POST, TRUE ); curl_setopt($ch, CURLOPT_POSTFIELDS, 'test'); $data = curl_exec($ch ); if ($data === FALSE) { $error = curl_errno($ch) . ' ' . curl_error($ch); echo $error; rewind($verbose); $verboseLog = stream_get_contents($verbose); echo "Verbose information:\n<pre>", htmlspecialchars($verboseLog), "</pre>\n"; } else { echo 'ok'; }

答案 1 :(得分:0)

您可以尝试转移merchant_address和merchant_name值,然后拆分压扁的价格并按空格计算fileds,并将两个值分配到价格和数量:

purchase_hash = row.to_hash
purchase_hash[:merchant_name] = purchase_hash[:merchant_address]
purchase_hash[:merchant_address] = purchase_hash[:count]
splitted_price_count = purchase_hash[:price].split(" ")
purchase_hash[:price] = splitted_price_count.first
purchase_hash[:count] = splitted_price_count.last
Purchase.create!(purchase_hash)