我有两个CSV文件:“用户”和“注册”:
001.csv:
user_id,user_name,state
12345,test_account,active
002.csv:
course_id,user_id,state
67890,12345,active
我需要创建一个像active_enrollments.csv这样的文件:
course_id,user_name
67890,test_account
如果没有多次循环文件,我如何解析这些文件以生成文件active_enrollments.csv?
这是我到目前为止所做的,但我得到了很多重复:
require 'csv'
CSV.open("active-enrollments.csv", "wb") do |csv|
csv << ["course_id", "user_name", "user_id","course_name", "status"]
end
Dir["csvs/*.csv"].each do |file|
#puts file
CSV.foreach(file, :headers => true) do |row|
if row['user_id'] && row ['course_id'] #finds enrollment csvs
if row['state'] == "active" #checks for active enrollments
state = row['state']
course_id = row['course_id']
user_id = row['user_id']
Dir["csvs/*.csv"].each do |files|
CSV.foreach(files, :headers => true) do |user|
if user['user_name']
if user_id == user['user_id']
user_name = user['user_name']
Dir["csvs/*.csv"].each do |file|
CSV.foreach(file, :headers => true) do |courses|
if course_id == courses['course_id']
course_name = courses['course_name']
CSV.open("active-enrollments.csv", "a") do |csv|
csv << [course_id, user_name, user_id, course_name, state]
end
end
end
end
end
end
end
end
end
end
end
end
我知道这很简单,但我似乎无法在没有多次循环文件并产生大量重复的情况下获得它。
答案 0 :(得分:2)
除了使用数据库或一堆完整的模型外,我建议使用简单的Hash作为查找。
以下内容尚未经过测试,我遗漏了所有过滤器。
按名称将用户与注册csvs分开,并在用户csvs上迭代一次,以user_id
创建查找。
users_csvs = Dir['csvs/users-*.csv']
enrollment_csvs = Dir['csvs/enrollment-*.csv']
users = {}
users_csvs.each do |user_file|
CSV.foreach(user_file, :headers => true) do |row|
# Put in whatever data you will need later
users[row['user_id']] = {:user_name => row['user_name'], :state => row['state']}
end
end
consolidated_csv = []
enrollment_csvs.each do |enrollment_file|
CSV.foreach(enrollment_file, :headers => true) do |row|
user_id = row['user_id']
if user = users[user_id]
# Put in whatever you want from the two objects
consolidated_csv << {:course_id => row['course_id'], :user_name => row['user_name']}
end
end
end
CSV.open("active-enrollments.csv", "wb") do |csv|
csv << ['course_id', 'user_name']
consolidated_csv.each { |row| csv << [row[:course_id], row[:user_name]] }
end
答案 1 :(得分:1)
使用Sqlite可能更容易,从CSV文件中提取数据,将其粘贴到临时数据库中,然后查询数据库以生成最终输出。
答案 2 :(得分:0)
以下是一些示例代码,展示了如何使用简单的SQLite数据库和Sequel ORM执行此操作:
require 'csv'
require 'sequel'
DB = Sequel.sqlite(File.dirname(__FILE__) + '/temp.db')
# user_id,user_name,state
# 12345,test_account,active
DB.create_table :csv1 do
primary_key :id
Integer :user_id
String :user_name
String :state
end
TABLE_001 = DB[:csv1]
CSV.foreach('001.csv', :headers => :first_row) do |row|
TABLE_001.insert(
:user_id => row['user_id'],
:user_name => row['user_name'],
:state => row['state']
)
end
# course_id,user_id,state
# 67890,12345,active
DB.create_table :csv2 do
primary_key :id
Integer :course_id
Integer :user_id
String :state
end
# I need to create one file like active_enrollments.csv:
#
# course_id,user_name
# 67890,test_account
TABLE_002 = DB[:csv2]
CSV.foreach('002.csv', :headers => :first_row) do |row|
TABLE_002.insert(
:course_id => row['course_id'],
:user_id => row['user_id'],
:state => row['state']
)
end
CSV.open('active_enrollments.csv', 'w') do |csv_out|
TABLE_001.each do |row_001|
row_002 = TABLE_002.where(:user_id => row_001[:user_id]).first
csv_out << [row_002[:course_id], row_001[:user_name]]
end
end
运行后,“active_enrollments.csv”包含:
67890,test_account
这是一个非常可扩展的解决方案。
运行这两次会出错,因为Sequel将尝试在数据库中生成新表。擦除文件,或为create_table
块添加异常处理程序。