如何在没有多个循环的情况下从多个CSV文件中提取数据?

时间:2013-07-09 03:29:58

标签: ruby csv

我有两个CSV文件:“用户”和“注册”:

001.csv:

user_id,user_name,state
12345,test_account,active

002.csv:

course_id,user_id,state
67890,12345,active

我需要创建一个像active_enrollments.csv这样的文件:

course_id,user_name
67890,test_account

如果没有多次循环文件,我如何解析这些文件以生成文件active_enrollments.csv?

这是我到目前为止所做的,但我得到了很多重复:

require 'csv'

CSV.open("active-enrollments.csv", "wb") do |csv|
  csv << ["course_id", "user_name", "user_id","course_name", "status"]
end
Dir["csvs/*.csv"].each do |file|
  #puts file
CSV.foreach(file, :headers => true) do |row|
if row['user_id'] && row ['course_id'] #finds enrollment csvs
  if row['state'] == "active" #checks for active enrollments
    state = row['state']
    course_id = row['course_id']
    user_id = row['user_id']
    Dir["csvs/*.csv"].each do |files|
      CSV.foreach(files, :headers => true) do |user|
        if user['user_name']
          if user_id == user['user_id']
            user_name = user['user_name']
            Dir["csvs/*.csv"].each do |file|
              CSV.foreach(file, :headers => true) do |courses|
                if course_id == courses['course_id']
                  course_name = courses['course_name']
                  CSV.open("active-enrollments.csv", "a") do |csv|
                    csv << [course_id, user_name, user_id, course_name, state]
                  end
                end 
              end
            end
          end
        end
      end
    end
  end
end
end
end

我知道这很简单,但我似乎无法在没有多次循环文件并产生大量重复的情况下获得它。

3 个答案:

答案 0 :(得分:2)

除了使用数据库或一堆完整的模型外,我建议使用简单的Hash作为查找。

以下内容尚未经过测试,我遗漏了所有过滤器。

按名称将用户与注册csvs分开,并在用户csvs上迭代一次,以user_id创建查找。

users_csvs = Dir['csvs/users-*.csv']
enrollment_csvs = Dir['csvs/enrollment-*.csv']

users = {} 
users_csvs.each do |user_file|
  CSV.foreach(user_file, :headers => true) do |row|
    # Put in whatever data you will need later
    users[row['user_id']] = {:user_name => row['user_name'], :state => row['state']}
  end
end

consolidated_csv = []
enrollment_csvs.each do |enrollment_file|
  CSV.foreach(enrollment_file, :headers => true) do |row|
    user_id = row['user_id']
    if user = users[user_id]
      # Put in whatever you want from the two objects
      consolidated_csv << {:course_id => row['course_id'], :user_name => row['user_name']}
    end
  end
end

CSV.open("active-enrollments.csv", "wb") do |csv|
   csv << ['course_id', 'user_name']
   consolidated_csv.each { |row| csv << [row[:course_id], row[:user_name]] }
end

答案 1 :(得分:1)

使用Sqlite可能更容易,从CSV文件中提取数据,将其粘贴到临时数据库中,然后查询数据库以生成最终输出。

答案 2 :(得分:0)

以下是一些示例代码,展示了如何使用简单的SQLite数据库和Sequel ORM执行此操作:

require 'csv'
require 'sequel'

DB = Sequel.sqlite(File.dirname(__FILE__) + '/temp.db')

# user_id,user_name,state
# 12345,test_account,active
DB.create_table :csv1 do
  primary_key :id
  Integer :user_id
  String :user_name
  String :state
end

TABLE_001 = DB[:csv1]
CSV.foreach('001.csv', :headers => :first_row) do |row|
  TABLE_001.insert(
    :user_id   => row['user_id'],
    :user_name => row['user_name'],
    :state     => row['state']
  )
end

# course_id,user_id,state
# 67890,12345,active
DB.create_table :csv2 do
  primary_key :id
  Integer :course_id
  Integer :user_id
  String :state
end

# I need to create one file like active_enrollments.csv:
#
#     course_id,user_name
#     67890,test_account
TABLE_002 = DB[:csv2]
CSV.foreach('002.csv', :headers => :first_row) do |row|
  TABLE_002.insert(
    :course_id => row['course_id'],
    :user_id   => row['user_id'],
    :state     => row['state']
  )
end

CSV.open('active_enrollments.csv', 'w') do |csv_out|
  TABLE_001.each do |row_001|
    row_002 = TABLE_002.where(:user_id => row_001[:user_id]).first
    csv_out << [row_002[:course_id], row_001[:user_name]]
  end
end

运行后,“active_enrollments.csv”包含:

67890,test_account

这是一个非常可扩展的解决方案。

运行这两次会出错,因为Sequel将尝试在数据库中生成新表。擦除文件,或为create_table块添加异常处理程序。