在Ruby 1.9中将​​CSV列的字符串转换为带有CSV lib的整数

时间:2013-01-14 08:46:46

标签: ruby csv sed

所以我有一个包含两列的CSV,其中包含字符串格式的美元金额。 head -n 5 file.csv显示以下内容:

Title,Distributor Long Name,Wk,Estimated Weekend Gross,Cume,Locs Reported,Avg/Loc,Booking Title #
"=""Zero Dark Thirty""","=""Sony""",4,"24,000,000","29,480,807",2937,"8,172","=""66273"""
"=""Haunted House, A""","=""Open Road""",1,"18,817,000","18,817,000",2160,"8,712","=""71209"""
"=""Gangster Squad""","=""Warner Bros.""",1,"16,710,000","16,710,000",3103,"5,385","=""66556"""
"=""Django Unchained""","=""The Weinstein Company""",3,"11,065,000","125,399,122",3012,"3,674","=""66122"""

这持续约40行。您会注意到两个列 - “Estimated Weekend Gross”和“Cume” - 的值为字符串

所以我的问题是,有没有办法只迭代这两列,将字符串值转换为类似row.to_s.gsub(',','').to_i之类的整数,然后将这些值覆盖到相同<的相应行中/ strong> CSV?

我尝试过这样的事情,但我没有得到格式正确的CSV ..

File.open('modified.csv', 'w') do |csv|
  CSV.foreach('original.csv') do |row|
    csv << row[0].to_s.gsub('=','').gsub(', The','')
    csv << row[3].to_s.gsub(',','').to_i
    csv << row[4].to_s.gsub(',','').to_i
  end
end

我在执行块时也使用:headers => :integer,但它不会让我将值从字符串转换为整数。那么,我错过了什么?我应该存储这些值然后写一个新的CSV还是有更简单的方法?

3 个答案:

答案 0 :(得分:3)

Aaron,只需更改行并将其写入新文件,如此

require 'csv'

File.open('modified.csv', 'w') do |csv|
  CSV.foreach('original.csv', :headers => true) do |row|
    row['Estimated Weekend Gross'] = row['Estimated Weekend Gross'].delete(',').to_i
    row['Cume'] = row['Cume'].delete(',').to_i
    csv << row
  end
end
编辑:如果你想在modified.csv中保存标题,你可以这样做,但如果有人有更好的解决方案,那么必须有一个较短的方法而不打开文件两次?

headers = CSV.open('original.csv', 'r', :headers => true).read.headers
CSV.open('modified.csv', 'w') do |csv|
  csv << headers
  CSV.foreach('original.csv', :headers => true) do |row|
    row['Estimated Weekend Gross'] = row['Estimated Weekend Gross'].delete(',').to_i
    row['Cume'] = row['Cume'].delete(',').to_i
    csv << row
  end
end

答案 1 :(得分:0)

你可以使用它来获取它:

sed 's/,\("[^"]*"\)*/|\1/g' file.csv | awk -F"|" '{s="";for (i=1; i<=NF; i++){if (i==4 || i==5){gsub("\,","",$i);gsub("\"","",$i);s=s","$i;}else{if (i>1){s=s","$i;}else{s=s""$i;}}}print s;}' -

我得到了这个输出:

"=""Zero Dark Thirty""","",4,24000000,29480807,2937,"8,172",""
"=""Haunted House, A""","",1,18817000,"18,817,000",2160,"8,712",""
"=""Gangster Squad""","",1,16710000,16710000,3103,"5,385",""
"=""Django Unchained""","",3,11065000,125399122,3012,"3,674",""

我知道这很难理解,所以我会一步一步解释:

  1. 首先,为每个字段设置一个分隔符,并考虑引号:

    sed's /,(“[^”] “) / | \ 1 / g'file.csv

  2. 你会得到一个管道分隔符“|”每个领域之间:

    "=""Zero Dark Thirty"""|""|4|"24,000,000"|"29,480,807"|2937|"8,172"|""
    "=""Haunted House| A"""|""|1|"18,817,000"|"18,817,000"|2160|"8,712"|""
    "=""Gangster Squad"""|""|1|"16,710,000"|"16,710,000"|3103|"5,385"|""
    "=""Django Unchained"""|""|3|"11,065,000"|"125,399,122"|3012|"3,674"|""
    
    1. 一旦使用pipe作为字段分隔符获得此输出,您可以使用awk将所描述的过滤器应用于字段4和5(它应该在sed命令之后运行,因为它将sed的输出作为输入):

      awk -F“|” '{s =“”; for(i = 1; i&lt; = NF; i ++){if(i == 4 || i == 5){gsub(“\,”,“”,$ i); gsub (“\”“,”“,$ i); s = s”,“$ i;} else {if(i&gt; 1){s = s”,“$ i;} else {s = s”“$ i;}}} print s;}' -

    2. 删除每个字段的引号和逗号(作为整数表示),并获得所需的输出:

      "=""Zero Dark Thirty""","",4,24000000,29480807,2937,"8,172",""
      "=""Haunted House, A""","",1,18817000,"18,817,000",2160,"8,712",""
      "=""Gangster Squad""","",1,16710000,16710000,3103,"5,385",""
      "=""Django Unchained""","",3,11065000,125399122,3012,"3,674",""
      

答案 2 :(得分:0)

你可以试试这个:

CSV.open('modified.csv', 'w') do |csv|
  CSV.foreach('original.csv') do |row|
    modified_row = row.clone
    modified_row[0] = row[0].to_s.gsub('=','').gsub(', The','')
    modified_row[3] = row[3].to_s.gsub(',','').to_i
    modified_row[4] = row[4].to_s.gsub(',','').to_i
    csv << modified_row
  end
end

我更改了要写入的文件打开以使用CSV,然后更正了追加以附加行行数组而不是附加单个值。