我有一个包含多个列的'master'文件:1 2 3 4 5.我有一些其他文件,行数比主文件少,每个文件都有列:1 6.我想要合并这些文件在第1列字段上匹配,并将第6列添加到主服务器。我见过一些python / UNIX解决方案,但如果它很合适,我更喜欢使用ruby / fastercsv。我将不胜感激任何帮助。
答案 0 :(得分:2)
FasterCSV现在是Ruby 1.9中的默认CSV实现。此代码未经测试,但应该有效。
require 'csv'
master = CSV.read('master.csv') # Reads in master
master.each {|each| each.push('')} # Adds another column to all rows
Dir.glob('*.csv').each do |each| #Goes thru all csv files
next if each == 'master.csv' # skips the master csv file
file = CSV.read(each) # Reads in each one
file.each do |line| #Goes thru each line of the file
temp = master.assoc(line[0]) # Finds the appropriate line in master
temp[-1] = line[1] if temp #updates last column if line is found
end
end
csv = CSV.open('output.csv','wb') #opens output csv file for writing
master.each {|each| csv << each} #Goes thru modified master and saves it to file
答案 1 :(得分:1)
$ cat j4.csv
how, now, brown, cow, f1
now, is, the, time, f2
one, two, three, four, five
xhow, now, brown, cow, f1
xnow, is, the, time, f2
xone, two, three, four, five
$ cat j4a.csv
how, b
one, d
$ cat hj.rb
require 'pp'
require 'rubygems'
require 'fastercsv'
pp(
FasterCSV.read('j4a.csv').inject(
FasterCSV.read('j4.csv').inject({}) do |m, e|
m[e[0]] = e
m
end) do |m, e|
k = e[0]
m[k] << e.last if m[k]
m
end.values)
$ ruby hj.rb
[["now", " is", " the", " time", " f2"],
["xhow", " now", " brown", " cow", " f1"],
["xone", " two", " three", " four", " five"],
["how", " now", " brown", " cow", " f1", " b"],
["one", " two", " three", " four", " five", " d"],
["xnow", " is", " the", " time", " f2"]]
这可以通过将主文件映射到第一列作为键的哈希,然后它只是从其他文件中查找键。如上所述,代码在键匹配时附加最后一列。由于您有多个非主文件,因此您可以通过将FasterCSV.read('j4a.csv')
替换为读取每个文件的方法并将它们连接成一个数组的数组来调整概念,或者您可以保存结果来自内部inject
(主哈希)并在循环中将每个其他文件应用于它。
答案 2 :(得分:0)
temp = master.assoc(line[0])
以上是一个非常慢的过程。整个复数至少为O(n ^ 2)。
我将使用以下过程:
它将大大降低复数为O(n)