我有数千条记录的数据库
Code | Name | Price
00106 | Water | 9.99
00107 | Onion | 8.99
在GES
文件中编码,如下所示:
00F
表示列标题00I
表示插入一行还有其他人喜欢(00D
用于删除行或00U
用于更新)
00F
0101
02Code
031
00F
0102
02Name
031
00F
0103
02Price
030
00I
0100106
02Water
030999
00I
0100107
02Onion
030899
我想创建处理此文件的导入程序并将其推送到我的数据库中。所以我开始实施:
class Importer
CONN = ActiveRecord::Base.connection
F = "00F"
I = "00I"
def extract_to_database(collection)
add = true
tmp = []
type = F
inserts = []
collection.each_with_index do |line, i|
_type = line.strip
_changed = [F,I].include? _type
if _changed && i > 0
case type
when F then @f << tmp
when I
group_id = Group.find_by(code: tmp[1]).id
inserts.push "(group_id,'#{tmp[2]}','#{tmp[3]}')"
end
tmp = []
type = _type
end
tmp << line
end
sql = "INSERT INTO products (`group_id`, `name`, `price`) VALUES #{inserts.join(", ")}"
CONN.execute sql
end
end
有一个问题,我想使用函数式编程来重构。
我必须通过code
找到其他模型并将其放到products
表相关的some_model_id
列中,这样可能会使整个过程复杂化。因为现在导入这些数据需要几个小时。
使用Ruby可能不是最佳选择。
答案 0 :(得分:2)
这里没有Ruby无法处理的东西。目前还不清楚“函数式编程”是如何帮助这一点的,因为这是一种经典的状态机问题,正在进行一些简单的数据转换。
示例脚手架:
class SomethingImporter
FIELD_MARKER = "00F"
INSERT_MARKER = "00I"
COLUMNS = %w[ group_id name price ]
# Performs the insert into a given model. This should probably be a class
# method on the model itself.
def bulk_insert(model, rows)
sql = [
"INSERT INTO `#{model.table_name}` (#{columns.collect { |c| }}"
]
# Append the placeholders: (?,?,?),(?,?,?),...
sql[0] += ([ '(%s)' % ([ '?' ] * COLUMNS.length).join(',') ] * rows.length).join(',')
sql += rows.flatten
model.connection.execute(model.send(:sanitize_sql, sql))
end
# Resolve a group code to a group_id value, and cache the result so that
# subsequent look-ups for the same code are valid.
def group_id(group_code)
@find_group ||= { }
# This tests if any value has been cached for this code, including one
# that might be nil.
if (@find_group.key?(group_code))
return @find_group[group_code]
end
group = Group.find_by(code: group_code)
@find_group[group_code] = group && group.id
end
# Call this with the actual collection, lines stripped, and with any header
# lines removed (e.g. collection.shift)
def extract_rows(collection)
state = nil
rows = [ ]
row = [ ]
collection.each_with_index do |line|
case (line)
when FIELD_MARKER
# Indicates field data to follow
state = :field
when INSERT_MARKER
case (state)
when :insert
rows << [ row[0], row[1], (row[2].sub(/^0+/, '').to_f / 100) ]
end
state = :insert
row = [ ]
else
case (state)
when :field
# Presumably you'd pay attention to the data here and establish
# a mapping table.
when :insert
row << line.sub(/^\d\d/, '')
# puts row.inspect
end
end
end
case (state)
when :insert
rows << [ row[0], row[1], (row[2].sub(/^0+/, '').to_f / 100) ]
end
rows
end
end
data = <<END
00F
0101
02Code
031
00F
0102
02Name
031
00F
0103
02Price
030
00I
0100106
02Water
030999
00I
0100107
02Onion
030899
END
importer = SomethingImporter.new
puts importer.extract_rows(data.split(/\n/)).inspect
根据您的数据,此示例输出如下所示:
[["00106", "Water", 9.99], ["00107", "Onion", 8.99]]
编写这样的代码时,请务必公开中间结果,以便能够测试正在发生的事情。您的实现会一次性获取数据并将其直接转储到数据库中,如果无法正确解决问题,则很难确定哪些内容出错。该版本由几种方法组成,每种方法都有更具体的用途。
原始示例中不清楚为什么你要解析group_id
,你的示例输出与此无关,但作为一个例子,我已经包含了一个方法来解决它们并保持它们的缓存,避免重复查找同一件事。对于更大规模的导入,您可能会加载许多行,提取不同的group_id值,一次加载它们,并在插入之前重新映射它们。