我在制表符分隔的文本文件中有以下记录:
sku title Product Type
19686940 This is test Title1 toys
19686941 This is test Title2 toys
19686942 This is test Title3 toys
20519300 This is test Title1 toys2
20519301 This is test Title2 toys2
20580987 This is test Title1 toys3
20580988 This is test Title2 toys3
20582176 This is test Title1 toys4
如何按Product Type
对项目进行分组,并找到title
中的所有唯一字词?
输出格式:
Product Type Unique_words
------------ ------------
toys This is test Title1 Title2 Title3
toys2 This is test Title1 Title2
toys3 This is test Title1 Title2
toys4 This is test Title1
更新
直到现在我已经完成了代码,直到读取文件并存储到数组中:
class Product
attr_reader :sku, :title, :productType
def initialize(sku,title,productType)
@sku = sku
@title = title
@productType = productType
end
def sku
@sku
end
def title
@title
end
def productType
@productType
end
end
class FileReader
def ReadFile(m_FilePath)
array = Array.new
lines = IO.readlines(m_FilePath)
lines.each_with_index do |line, i|
current_row = line.split("\t")
product = Product.new(current_row[0],current_row[1],current_row[2])
array.push product
end
end
end
filereader_method = FileReader.new.method("ReadFile")
Reading = filereader_method.to_proc
puts Reading.call("Input.txt")
答案 0 :(得分:0)
要进行分组,您可以使用Enumerable#group_by:
Product = Struct.new(:sku, :title, :product_type)
def products_by_type(file_path)
File.open(file_path)
.map{ |line| Product.new(*line.chomp.split("\t")) }
.group_by{ |product| product.product_type }
end
Ruby的美妙之处在于你有很多选择。您还可以查看CSV lib和OpenStruct,因为这只是一个数据对象:
require 'csv'
require 'ostruct'
def products_by_type(file_path)
csv_opts = { col_sep: "\t",
headers: true,
header_converters: [:downcase, :symbol] }
CSV.open(file_path, csv_opts)
.map{ |row| OpenStruct.new row.to_hash }
.group_by{ |product| product.product_type }
end
或者使用基于哈希键的创作成语来删除上面#to_hash
上对row
的调用:
class Product
attr_accessor :sku, :title, :product_type
def initialize(data)
data.each{ |key, value| self.key = value }
end
end
def products_by_type(file_path)
csv_opts = { #... }
CSV.open(file_path, csv_opts)
.map{ |row| Product.new row }
.group_by{ |product| product.product_type }
end
然后根据哈希值,根据需要格式化输出:
def unique_title_words(*products)
products.flat_map{ |product| product.title.scan(/\w+/) }
.unique
end
puts "Product Type\tUnique Words"
products_by_type("./file.txt").each do |type, products|
puts "#{type}\t#{unique_title_words products}"
end