如何强制Ruby的CSV输出中的一个字段用双引号括起来?

时间:2011-01-31 19:01:42

标签: ruby csv

我正在使用Ruby的内置CSV生成一些CSV输出。一切正常,但客户希望输出中的name字段包含双引号,因此输出看起来像输入文件。例如,输入看起来像这样:

1,1.1.1.1,"Firstname Lastname",more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields

CSV的输出正确,如下所示:

1,1.1.1.1,Firstname Lastname,more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields

我知道CSV正在通过不引用第三个字段来做正确的事情,因为它嵌入了空白,并且当它具有嵌入的逗号时用双引号包装字段。为了帮助客户感到温暖和模糊,我想做的是告诉CSV总是双引第三个字段。

我尝试在我的to_a方法中用双引号括起字段,这会创建一个传递给CSV的"Firstname Lastname"字段,但是CSV嘲笑我的小人类尝试并输出{{1} }。这是正确的做法,因为它正在逃避双引号,所以这不起作用。

然后我尝试在"""Firstname Lastname"""方法中设置CSV的:force_quotes => true,输出双引号按预期包装所有字段,但客户不喜欢这样,我也是这样想的。所以,这也不起作用。

我查看了Table和Row文档,似乎没有任何内容可以让我访问“生成字符串字段”方法,或者设置“for field n always use quoting”标记的方法。

我即将潜入消息来源,看看是否有一些超级秘密的调整,或者是否有办法修补CSV并弯曲它以实现我的意愿,但想知道是否有人有一些特殊的知识或有在此之前遇到这个。

而且,是的,我知道我可以滚动自己的CSV输出,但我更喜欢不重新发明经过良好测试的轮子。而且,我也知道FasterCSV;这是我正在使用的Ruby 1.9.2的一部分,因此明确使用FasterCSV并没有什么特别之处。另外,我没有使用Rails并且无意在Rails中重写它,所以除非你有一个可爱的方式使用一小部分Rails实现它,不要打扰。我会推荐使用这些方法的任何建议,因为你没有费心去读这篇文章。

6 个答案:

答案 0 :(得分:9)

嗯,有一种方法可以做到,但它并不像我希望CSV代码允许的那样干净。

我必须继承CSV,然后覆盖CSV::Row.<<=方法并添加另一个方法forced_quote_fields=,以便可以定义我想要强制引用的字段,还可以从其他方法中拉出两个lambdas 。至少它适用于我想要的东西:

require 'csv'

class MyCSV < CSV
    def <<(row)
      # make sure headers have been assigned
      if header_row? and [Array, String].include? @use_headers.class
        parse_headers  # won't read data for Array or String
        self << @headers if @write_headers
      end

      # handle CSV::Row objects and Hashes
      row = case row
        when self.class::Row then row.fields
        when Hash            then @headers.map { |header| row[header] }
        else                      row
      end

      @headers = row if header_row?
      @lineno  += 1

      @do_quote ||= lambda do |field|
        field         = String(field)
        encoded_quote = @quote_char.encode(field.encoding)
        encoded_quote                                +
        field.gsub(encoded_quote, encoded_quote * 2) +
        encoded_quote
      end

      @quotable_chars      ||= encode_str("\r\n", @col_sep, @quote_char)
      @forced_quote_fields ||= []

      @my_quote_lambda ||= lambda do |field, index|
        if field.nil?  # represent +nil+ fields as empty unquoted fields
          ""
        else
          field = String(field)  # Stringify fields
          # represent empty fields as empty quoted fields
          if (
            field.empty?                          or
            field.count(@quotable_chars).nonzero? or
            @forced_quote_fields.include?(index)
          )
            @do_quote.call(field)
          else
            field  # unquoted field
          end
        end
      end

      output = row.map.with_index(&@my_quote_lambda).join(@col_sep) + @row_sep  # quote and separate
      if (
        @io.is_a?(StringIO)             and
        output.encoding != raw_encoding and
        (compatible_encoding = Encoding.compatible?(@io.string, output))
      )
        @io = StringIO.new(@io.string.force_encoding(compatible_encoding))
        @io.seek(0, IO::SEEK_END)
      end
      @io << output

      self  # for chaining
    end
    alias_method :add_row, :<<
    alias_method :puts,    :<<

    def forced_quote_fields=(indexes=[])
      @forced_quote_fields = indexes
    end
end

这就是代码。打电话给:

data = [ 
  %w[1 2 3], 
  [ 2, 'two too',  3 ], 
  [ 3, 'two, too', 3 ] 
]

quote_fields = [1]

puts "Ruby version:   #{ RUBY_VERSION }"
puts "Quoting fields: #{ quote_fields.join(', ') }", "\n"

csv = MyCSV.generate do |_csv|
  _csv.forced_quote_fields = quote_fields
  data.each do |d| 
    _csv << d
  end
end

puts csv

结果:

# >> Ruby version:   1.9.2
# >> Quoting fields: 1
# >> 
# >> 1,"2",3
# >> 2,"two too",3
# >> 3,"two, too",3

答案 1 :(得分:5)

这篇文章很老,但我无法相信没有人想到这一点。

为什么不这样做:

csv = CSV.generate :quote_char => "\0" do |csv|

其中\ 0是空字符,然后只需在每个需要它们的字段中添加引号:

csv << [product.upc, "\"" + product.name + "\"" # ...

然后在最后你可以做一个

csv.gsub!(/\0/, '')

答案 2 :(得分:4)

我怀疑这是否会帮助顾客在这段时间后感到温暖和模糊,但这似乎有效:

require 'csv'
#prepare a lambda which converts field with index 2 
quote_col2 = lambda do |field, fieldinfo|
  # fieldinfo has a line- ,header- and index-method
  if fieldinfo.index == 2 && !field.start_with?('"') then 
    '"' + field + '"'
  else
    field
  end
end

# specify above lambda as one of the converters
csv =  CSV.read("test1.csv", :converters => [quote_col2])
p csv 
# => [["aaa", "bbb", "\"ccc\"", "ddd"], ["fff", "ggg", "\"hhh\"", "iii"]]
File.open("test1.txt","w"){|out| csv.each{|line|out.puts line.join(",")}}

答案 3 :(得分:0)

看起来没有任何方法可以使用现有的CSV实现,而不是猴子修补/重写它。

但是,假设您可以完全控制源数据,则可以执行以下操作:

  1. 将自定义字符串(包括逗号)(即数据中永远不会自然找到的字符串)附加到每行的相关字段的末尾;也许像“ FORCE_COMMAS,”。
  2. 生成CSV输出。
  3. 既然您的字段的每一行都有带引号的CSV输出,请删除自定义字符串:csv.gsub!(/FORCE_COMMAS,/, "")
  4. 顾客感到温暖和模糊。

答案 4 :(得分:0)

  

CSV已经在@jwadsa​​ck中提到的Ruby 2.1中有所改变,但这里是@ the-tin-man的MyCSV的工作版本。位修改后,您可以通过选项设置forced_quote_fields。

MyCSV.generate(forced_quote_fields: [1]) do |_csv|...

修改后的代码

require 'csv'

class MyCSV < CSV

  def <<(row)
    # make sure headers have been assigned
    if header_row? and [Array, String].include? @use_headers.class
      parse_headers  # won't read data for Array or String
      self << @headers if @write_headers
    end

    # handle CSV::Row objects and Hashes
    row = case row
          when self.class::Row then row.fields
          when Hash            then @headers.map { |header| row[header] }
          else                      row
          end

    @headers =  row if header_row?
    @lineno  += 1

    output = row.map.with_index(&@quote).join(@col_sep) + @row_sep  # quote and separate
    if @io.is_a?(StringIO)             and
       output.encoding != (encoding = raw_encoding)
      if @force_encoding
        output = output.encode(encoding)
      elsif (compatible_encoding = Encoding.compatible?(@io.string, output))
        @io.set_encoding(compatible_encoding)
        @io.seek(0, IO::SEEK_END)
      end
    end
    @io << output

    self  # for chaining
  end

  def init_separators(options)
    # store the selected separators
    @col_sep    = options.delete(:col_sep).to_s.encode(@encoding)
    @row_sep    = options.delete(:row_sep)  # encode after resolving :auto
    @quote_char = options.delete(:quote_char).to_s.encode(@encoding)
    @forced_quote_fields = options.delete(:forced_quote_fields) || []

    if @quote_char.length != 1
      raise ArgumentError, ":quote_char has to be a single character String"
    end

    #
    # automatically discover row separator when requested
    # (not fully encoding safe)
    #
    if @row_sep == :auto
      if [ARGF, STDIN, STDOUT, STDERR].include?(@io) or
         (defined?(Zlib) and @io.class == Zlib::GzipWriter)
        @row_sep = $INPUT_RECORD_SEPARATOR
      else
        begin
          #
          # remember where we were (pos() will raise an exception if @io is pipe
          # or not opened for reading)
          #
          saved_pos = @io.pos
          while @row_sep == :auto
            #
            # if we run out of data, it's probably a single line
            # (ensure will set default value)
            #
            break unless sample = @io.gets(nil, 1024)
            # extend sample if we're unsure of the line ending
            if sample.end_with? encode_str("\r")
              sample << (@io.gets(nil, 1) || "")
            end

            # try to find a standard separator
            if sample =~ encode_re("\r\n?|\n")
              @row_sep = $&
              break
            end
          end

          # tricky seek() clone to work around GzipReader's lack of seek()
          @io.rewind
          # reset back to the remembered position
          while saved_pos > 1024  # avoid loading a lot of data into memory
            @io.read(1024)
            saved_pos -= 1024
          end
          @io.read(saved_pos) if saved_pos.nonzero?
        rescue IOError         # not opened for reading
          # do nothing:  ensure will set default
        rescue NoMethodError   # Zlib::GzipWriter doesn't have some IO methods
          # do nothing:  ensure will set default
        rescue SystemCallError # pipe
          # do nothing:  ensure will set default
        ensure
          #
          # set default if we failed to detect
          # (stream not opened for reading, a pipe, or a single line of data)
          #
          @row_sep = $INPUT_RECORD_SEPARATOR if @row_sep == :auto
        end
      end
    end
    @row_sep = @row_sep.to_s.encode(@encoding)

    # establish quoting rules
    @force_quotes   = options.delete(:force_quotes)
    do_quote        = lambda do |field|
      field         = String(field)
      encoded_quote = @quote_char.encode(field.encoding)
      encoded_quote                                +
      field.gsub(encoded_quote, encoded_quote * 2) +
      encoded_quote
    end
    quotable_chars = encode_str("\r\n", @col_sep, @quote_char)

    @quote         = if @force_quotes
      do_quote
    else
      lambda do |field, index|
        if field.nil?  # represent +nil+ fields as empty unquoted fields
          ""
        else
          field = String(field)  # Stringify fields
          # represent empty fields as empty quoted fields
          if field.empty? or
             field.count(quotable_chars).nonzero? or
             @forced_quote_fields.include?(index)
            do_quote.call(field)
          else
            field  # unquoted field
          end
        end
      end
    end
  end
end

答案 5 :(得分:-1)

CSV有一个force_quotes选项会强制它引用所有字段(当你最初发布时,它可能不存在)。我意识到这并不完全是你提出的建议,但它不像猴子修补那样。

2.1.0 :008 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields']
1,1.1.1.1,Firstname Lastname,more,fields
2.1.0 :009 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields'], force_quotes: true
"1","1.1.1.1","Firstname Lastname","more","fields"

缺点是第一个整数值最终以字符串形式列出,这会在您导入Excel时发生变化。