我正在使用Ruby的内置CSV生成一些CSV输出。一切正常,但客户希望输出中的name字段包含双引号,因此输出看起来像输入文件。例如,输入看起来像这样:
1,1.1.1.1,"Firstname Lastname",more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields
CSV的输出正确,如下所示:
1,1.1.1.1,Firstname Lastname,more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields
我知道CSV正在通过不引用第三个字段来做正确的事情,因为它嵌入了空白,并且当它具有嵌入的逗号时用双引号包装字段。为了帮助客户感到温暖和模糊,我想做的是告诉CSV总是双引第三个字段。
我尝试在我的to_a
方法中用双引号括起字段,这会创建一个传递给CSV的"Firstname Lastname"
字段,但是CSV嘲笑我的小人类尝试并输出{{1} }。这是正确的做法,因为它正在逃避双引号,所以这不起作用。
然后我尝试在"""Firstname Lastname"""
方法中设置CSV的:force_quotes => true
,输出双引号按预期包装所有字段,但客户不喜欢这样,我也是这样想的。所以,这也不起作用。
我查看了Table和Row文档,似乎没有任何内容可以让我访问“生成字符串字段”方法,或者设置“for field n always use quoting”标记的方法。
我即将潜入消息来源,看看是否有一些超级秘密的调整,或者是否有办法修补CSV并弯曲它以实现我的意愿,但想知道是否有人有一些特殊的知识或有在此之前遇到这个。
而且,是的,我知道我可以滚动自己的CSV输出,但我更喜欢不重新发明经过良好测试的轮子。而且,我也知道FasterCSV;这是我正在使用的Ruby 1.9.2的一部分,因此明确使用FasterCSV并没有什么特别之处。另外,我没有使用Rails并且无意在Rails中重写它,所以除非你有一个可爱的方式使用一小部分Rails实现它,不要打扰。我会推荐使用这些方法的任何建议,因为你没有费心去读这篇文章。
答案 0 :(得分:9)
嗯,有一种方法可以做到,但它并不像我希望CSV代码允许的那样干净。
我必须继承CSV,然后覆盖CSV::Row.<<=
方法并添加另一个方法forced_quote_fields=
,以便可以定义我想要强制引用的字段,还可以从其他方法中拉出两个lambdas 。至少它适用于我想要的东西:
require 'csv'
class MyCSV < CSV
def <<(row)
# make sure headers have been assigned
if header_row? and [Array, String].include? @use_headers.class
parse_headers # won't read data for Array or String
self << @headers if @write_headers
end
# handle CSV::Row objects and Hashes
row = case row
when self.class::Row then row.fields
when Hash then @headers.map { |header| row[header] }
else row
end
@headers = row if header_row?
@lineno += 1
@do_quote ||= lambda do |field|
field = String(field)
encoded_quote = @quote_char.encode(field.encoding)
encoded_quote +
field.gsub(encoded_quote, encoded_quote * 2) +
encoded_quote
end
@quotable_chars ||= encode_str("\r\n", @col_sep, @quote_char)
@forced_quote_fields ||= []
@my_quote_lambda ||= lambda do |field, index|
if field.nil? # represent +nil+ fields as empty unquoted fields
""
else
field = String(field) # Stringify fields
# represent empty fields as empty quoted fields
if (
field.empty? or
field.count(@quotable_chars).nonzero? or
@forced_quote_fields.include?(index)
)
@do_quote.call(field)
else
field # unquoted field
end
end
end
output = row.map.with_index(&@my_quote_lambda).join(@col_sep) + @row_sep # quote and separate
if (
@io.is_a?(StringIO) and
output.encoding != raw_encoding and
(compatible_encoding = Encoding.compatible?(@io.string, output))
)
@io = StringIO.new(@io.string.force_encoding(compatible_encoding))
@io.seek(0, IO::SEEK_END)
end
@io << output
self # for chaining
end
alias_method :add_row, :<<
alias_method :puts, :<<
def forced_quote_fields=(indexes=[])
@forced_quote_fields = indexes
end
end
这就是代码。打电话给:
data = [
%w[1 2 3],
[ 2, 'two too', 3 ],
[ 3, 'two, too', 3 ]
]
quote_fields = [1]
puts "Ruby version: #{ RUBY_VERSION }"
puts "Quoting fields: #{ quote_fields.join(', ') }", "\n"
csv = MyCSV.generate do |_csv|
_csv.forced_quote_fields = quote_fields
data.each do |d|
_csv << d
end
end
puts csv
结果:
# >> Ruby version: 1.9.2
# >> Quoting fields: 1
# >>
# >> 1,"2",3
# >> 2,"two too",3
# >> 3,"two, too",3
答案 1 :(得分:5)
这篇文章很老,但我无法相信没有人想到这一点。
为什么不这样做:
csv = CSV.generate :quote_char => "\0" do |csv|
其中\ 0是空字符,然后只需在每个需要它们的字段中添加引号:
csv << [product.upc, "\"" + product.name + "\"" # ...
然后在最后你可以做一个
csv.gsub!(/\0/, '')
答案 2 :(得分:4)
我怀疑这是否会帮助顾客在这段时间后感到温暖和模糊,但这似乎有效:
require 'csv'
#prepare a lambda which converts field with index 2
quote_col2 = lambda do |field, fieldinfo|
# fieldinfo has a line- ,header- and index-method
if fieldinfo.index == 2 && !field.start_with?('"') then
'"' + field + '"'
else
field
end
end
# specify above lambda as one of the converters
csv = CSV.read("test1.csv", :converters => [quote_col2])
p csv
# => [["aaa", "bbb", "\"ccc\"", "ddd"], ["fff", "ggg", "\"hhh\"", "iii"]]
File.open("test1.txt","w"){|out| csv.each{|line|out.puts line.join(",")}}
答案 3 :(得分:0)
看起来没有任何方法可以使用现有的CSV实现,而不是猴子修补/重写它。
但是,假设您可以完全控制源数据,则可以执行以下操作:
csv.gsub!(/FORCE_COMMAS,/, "")
答案 4 :(得分:0)
CSV已经在@jwadsack中提到的Ruby 2.1中有所改变,但这里是@ the-tin-man的MyCSV的工作版本。位修改后,您可以通过选项设置forced_quote_fields。
MyCSV.generate(forced_quote_fields: [1]) do |_csv|...
修改后的代码
require 'csv'
class MyCSV < CSV
def <<(row)
# make sure headers have been assigned
if header_row? and [Array, String].include? @use_headers.class
parse_headers # won't read data for Array or String
self << @headers if @write_headers
end
# handle CSV::Row objects and Hashes
row = case row
when self.class::Row then row.fields
when Hash then @headers.map { |header| row[header] }
else row
end
@headers = row if header_row?
@lineno += 1
output = row.map.with_index(&@quote).join(@col_sep) + @row_sep # quote and separate
if @io.is_a?(StringIO) and
output.encoding != (encoding = raw_encoding)
if @force_encoding
output = output.encode(encoding)
elsif (compatible_encoding = Encoding.compatible?(@io.string, output))
@io.set_encoding(compatible_encoding)
@io.seek(0, IO::SEEK_END)
end
end
@io << output
self # for chaining
end
def init_separators(options)
# store the selected separators
@col_sep = options.delete(:col_sep).to_s.encode(@encoding)
@row_sep = options.delete(:row_sep) # encode after resolving :auto
@quote_char = options.delete(:quote_char).to_s.encode(@encoding)
@forced_quote_fields = options.delete(:forced_quote_fields) || []
if @quote_char.length != 1
raise ArgumentError, ":quote_char has to be a single character String"
end
#
# automatically discover row separator when requested
# (not fully encoding safe)
#
if @row_sep == :auto
if [ARGF, STDIN, STDOUT, STDERR].include?(@io) or
(defined?(Zlib) and @io.class == Zlib::GzipWriter)
@row_sep = $INPUT_RECORD_SEPARATOR
else
begin
#
# remember where we were (pos() will raise an exception if @io is pipe
# or not opened for reading)
#
saved_pos = @io.pos
while @row_sep == :auto
#
# if we run out of data, it's probably a single line
# (ensure will set default value)
#
break unless sample = @io.gets(nil, 1024)
# extend sample if we're unsure of the line ending
if sample.end_with? encode_str("\r")
sample << (@io.gets(nil, 1) || "")
end
# try to find a standard separator
if sample =~ encode_re("\r\n?|\n")
@row_sep = $&
break
end
end
# tricky seek() clone to work around GzipReader's lack of seek()
@io.rewind
# reset back to the remembered position
while saved_pos > 1024 # avoid loading a lot of data into memory
@io.read(1024)
saved_pos -= 1024
end
@io.read(saved_pos) if saved_pos.nonzero?
rescue IOError # not opened for reading
# do nothing: ensure will set default
rescue NoMethodError # Zlib::GzipWriter doesn't have some IO methods
# do nothing: ensure will set default
rescue SystemCallError # pipe
# do nothing: ensure will set default
ensure
#
# set default if we failed to detect
# (stream not opened for reading, a pipe, or a single line of data)
#
@row_sep = $INPUT_RECORD_SEPARATOR if @row_sep == :auto
end
end
end
@row_sep = @row_sep.to_s.encode(@encoding)
# establish quoting rules
@force_quotes = options.delete(:force_quotes)
do_quote = lambda do |field|
field = String(field)
encoded_quote = @quote_char.encode(field.encoding)
encoded_quote +
field.gsub(encoded_quote, encoded_quote * 2) +
encoded_quote
end
quotable_chars = encode_str("\r\n", @col_sep, @quote_char)
@quote = if @force_quotes
do_quote
else
lambda do |field, index|
if field.nil? # represent +nil+ fields as empty unquoted fields
""
else
field = String(field) # Stringify fields
# represent empty fields as empty quoted fields
if field.empty? or
field.count(quotable_chars).nonzero? or
@forced_quote_fields.include?(index)
do_quote.call(field)
else
field # unquoted field
end
end
end
end
end
end
答案 5 :(得分:-1)
CSV
有一个force_quotes
选项会强制它引用所有字段(当你最初发布时,它可能不存在)。我意识到这并不完全是你提出的建议,但它不像猴子修补那样。
2.1.0 :008 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields']
1,1.1.1.1,Firstname Lastname,more,fields
2.1.0 :009 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields'], force_quotes: true
"1","1.1.1.1","Firstname Lastname","more","fields"
缺点是第一个整数值最终以字符串形式列出,这会在您导入Excel时发生变化。