我该如何将这些数据导入我的数据库?

时间:2013-08-07 10:07:57

标签: ruby-on-rails ruby import

我有数千条记录的数据库

Code  | Name  | Price
00106 | Water | 9.99
00107 | Onion | 8.99

GES文件中编码,如下所示:

  • 00F表示列标题
  • 00I表示插入一行

还有其他人喜欢(00D用于删除行或00U用于更新)

00F
0101
02Code
031
00F
0102
02Name
031
00F
0103
02Price
030
00I
0100106
02Water
030999
00I
0100107
02Onion
030899

我想创建处理此文件的导入程序并将其推送到我的数据库中。所以我开始实施:

class Importer
  CONN = ActiveRecord::Base.connection
  F = "00F"
  I = "00I"

  def extract_to_database(collection)
    add       = true
    tmp       = []
    type      = F
    inserts   = []

    collection.each_with_index do |line, i|
      _type    = line.strip
      _changed = [F,I].include? _type

      if _changed && i > 0
        case type
        when F then @f << tmp
        when I
          group_id = Group.find_by(code: tmp[1]).id
          inserts.push "(group_id,'#{tmp[2]}','#{tmp[3]}')"
        end

        tmp  = []
        type = _type
      end

      tmp << line
    end
    sql = "INSERT INTO products (`group_id`, `name`, `price`) VALUES #{inserts.join(", ")}"
    CONN.execute sql
  end
end

有一个问题,我想使用函数式编程来重构。

我必须通过code找到其他模型并将其放到products表相关的some_model_id列中,这样可能会使整个过程复杂化。因为现在导入这些数据需要几个小时。

使用Ruby可能不是最佳选择。

1 个答案:

答案 0 :(得分:2)

这里没有Ruby无法处理的东西。目前还不清楚“函数式编程”是如何帮助这一点的,因为这是一种经典的状态机问题,正在进行一些简单的数据转换。

示例脚手架:

class SomethingImporter
  FIELD_MARKER = "00F"
  INSERT_MARKER = "00I"

  COLUMNS = %w[ group_id name price ]

  # Performs the insert into a given model. This should probably be a class
  # method on the model itself.
  def bulk_insert(model, rows)
    sql = [
      "INSERT INTO `#{model.table_name}` (#{columns.collect { |c| }}"
    ]

    # Append the placeholders: (?,?,?),(?,?,?),...
    sql[0] += ([ '(%s)' % ([ '?' ] * COLUMNS.length).join(',') ] * rows.length).join(',')

    sql += rows.flatten

    model.connection.execute(model.send(:sanitize_sql, sql))
  end

  # Resolve a group code to a group_id value, and cache the result so that
  # subsequent look-ups for the same code are valid.
  def group_id(group_code)
    @find_group ||= { }

    # This tests if any value has been cached for this code, including one
    # that might be nil.
    if (@find_group.key?(group_code))
      return @find_group[group_code]
    end

    group = Group.find_by(code: group_code)

    @find_group[group_code] = group && group.id
  end

  # Call this with the actual collection, lines stripped, and with any header
  # lines removed (e.g. collection.shift)
  def extract_rows(collection)
    state = nil
    rows = [ ]
    row = [ ]

    collection.each_with_index do |line|
      case (line)
      when FIELD_MARKER
        # Indicates field data to follow
        state = :field
      when INSERT_MARKER
        case (state)
        when :insert
          rows << [ row[0], row[1], (row[2].sub(/^0+/, '').to_f / 100) ]
        end

        state = :insert
        row = [ ]
      else
        case (state)
        when :field
          # Presumably you'd pay attention to the data here and establish
          # a mapping table.
        when :insert
          row << line.sub(/^\d\d/, '')
          # puts row.inspect
        end
      end
    end

    case (state)
    when :insert
      rows << [ row[0], row[1], (row[2].sub(/^0+/, '').to_f / 100) ]
    end

    rows
  end
end


data = <<END
00F
0101
02Code
031
00F
0102
02Name
031
00F
0103
02Price
030
00I
0100106
02Water
030999
00I
0100107
02Onion
030899
END

importer = SomethingImporter.new

puts importer.extract_rows(data.split(/\n/)).inspect

根据您的数据,此示例输出如下所示:

[["00106", "Water", 9.99], ["00107", "Onion", 8.99]]

编写这样的代码时,请务必公开中间结果,以便能够测试正在发生的事情。您的实现会一次性获取数据并将其直接转储到数据库中,如果无法正确解决问题,则很难确定哪些内容出错。该版本由几种方法组成,每种方法都有更具体的用途。

原始示例中不清楚为什么你要解析group_id,你的示例输出与此无关,但作为一个例子,我已经包含了一个方法来解决它们并保持它们的缓存,避免重复查找同一件事。对于更大规模的导入,您可能会加载许多行,提取不同的group_id值,一次加载它们,并在插入之前重新映射它们。