我正在尝试将来自不同制表符分隔文件的多个表中的数据连接起来作为示例:
我有桌子:
file1.txt
a 3
b 4
c 8
d 22
e 4
file2.txt
a 10.3 -2
b 4.7 -1
c 8.9 -2
e 22.1 -1
file3.txt
b T
c F
d T
f F
g T
我想加入他们的公共密钥,这是产生下表的第一列:
a 3 10.3 -2
b 4 4.7 -1 T
c 8 8.9 -2 F
d 22 T
e 4 22.1 -1
f F
g T
我怎么能用ruby来实现这个目标..
泰德
答案 0 :(得分:3)
我不知道另一种方法,但这将创建一个包含所有内容的哈希:
files = ['file1.txt', 'file2.txt', 'file3.txt']
result = Hash.new
files.each_with_index do |file, i|
File.foreach(file) do |line|
key, value = /(\w)\s+(.*)/.match(line).captures
result[key] = Array.new(files.size) unless result.has_key?(key)
result[key][i] = value
end
end
哈希result
如下所示:
{"a" => ["3", "10.3 -2", nil],
"b" => ["4", "4.7 -1", "T"],
"c" => ["8", "8.9 -2", "F"],
"d" => ["22", nil, "T"],
"e" => ["4", "22.1 -1", nil],
"f" => [nil, nil, "F"],
"g" => [nil, nil, "T"]}
答案 1 :(得分:2)
你可以这样做:
require 'csv'
def load(file)
CSV.open(file, :col_sep => "\t").
each_with_object({ }) { |r, h| h[r.shift] = r }
end
# Load it all into hashes with a convenient format.
# The PK will be the key, the rest of the row will be the value as an array.
file1 = load('file1.txt')
file2 = load('file2.txt')
file3 = load('file3.txt')
# Figure out the rows in the final table
rows = (file1.keys | file2.keys | file3.keys).each_with_object({}) { |k,h| h[k] = [] }
# Use the block form of Hash#merge to join.
cols = [file1, file2, file3].inject([]) {|a, f| a.push(a.last.to_i + f.first.last.length)}
joined = rows.merge(file1).
merge(file2) { |k, o, n| (o + [nil] * (cols[0] - o.length)) + n }.
merge(file3) { |k, o, n| (o + [nil] * (cols[1] - o.length)) + n }
# Patch any missing values in the last column.
joined.each { |k, v| v.concat([nil] * (cols[2] - v.length)) }
结果是这样的哈希:
{"a"=>["3", "10.3", "-2", nil],
"b"=>["4", "4.7", "-1", "T"],
"c"=>["8", "8.9", "-2", "F"],
"d"=>["22", nil, nil, "T"],
"e"=>["4", "22.1", "-1", nil],
"f"=>[nil, nil, nil, "F"],
"g"=>[nil, nil, nil, "T"]}
如果需要,您可以轻松地将其转换为数组数组。推广到多个文件也应该是相当直接的。当然,还有其他方法可以实现各种步骤,但我会将这些改进作为练习。
如果文件很大,那么最好将它们放入SQLite数据库并在SQL中进行连接。
答案 2 :(得分:1)