each_line清理引号之间的换行符

时间:2016-07-22 18:45:29

标签: ruby regex csv

我有一个CSV文件,其中包含引号内的换行符。我想摆脱那些(并将它们替换为\),以便能够 CSV.parse逐行

我的原创是一个包含

的字符串
"a","b",c,"d
e",f,g,"h
i",j
k,"l","m
n","o"

我希望有效地解析包含以下内容的字符串:

"a","b",c,"d e",f,g,"h i",j
k,"l","m n","o" 

如何在Ruby中做到这一点?

感谢用户@sln

,提供有效而实用的解决方案
fichier = File.open ("baz.csv")

matchesBalancedLinesFromUser_sln = /^[^"]*(?:"[^"]*"[^"]*)*$/

mem = ""
fichier.each_line do |ligne| 
  mem += ligne.delete("\n") # as long as we don't have balance for 
                            # the quotations marks, we cat the lines 
  if mem =~ matchesBalancedLinesFromUser_sln
    ligneReplaced = mem + "\n"
    doWhatYouWill(ligneReplaced)
    mem = ""
  end 
end

fichier.rewind

没有正则表达式的另一种方法,只计算引号

fichier = File.open ("baz.csv")

def doWhatYouWill (string)
  puts string
end

mem = ""
fichier.each_line do |ligne| 
  mem += ligne.strip + " " # as long as we don't have balance for 
                           # the quotations marks, we cat the lines 
  if mem.scan(/"/).count.even? # if mem has even quotation marks
    ligneReplaced = mem + "\n"
    doWhatYouWill(ligneReplaced)
    mem = ""
  end 
end

fichier.rewind

注意此解决方案假定CSV文件的引号平衡有效。如果不是这种情况,请参阅this comment by User @sln

0 个答案:

没有答案