所以我有一个包含两列的CSV,其中包含字符串格式的美元金额。 head -n 5 file.csv
显示以下内容:
Title,Distributor Long Name,Wk,Estimated Weekend Gross,Cume,Locs Reported,Avg/Loc,Booking Title #
"=""Zero Dark Thirty""","=""Sony""",4,"24,000,000","29,480,807",2937,"8,172","=""66273"""
"=""Haunted House, A""","=""Open Road""",1,"18,817,000","18,817,000",2160,"8,712","=""71209"""
"=""Gangster Squad""","=""Warner Bros.""",1,"16,710,000","16,710,000",3103,"5,385","=""66556"""
"=""Django Unchained""","=""The Weinstein Company""",3,"11,065,000","125,399,122",3012,"3,674","=""66122"""
这持续约40行。您会注意到两个列 - “Estimated Weekend Gross”和“Cume” - 的值为字符串。
所以我的问题是,有没有办法只迭代这两列,将字符串值转换为类似row.to_s.gsub(',','').to_i
之类的整数,然后将这些值覆盖到相同<的相应行中/ strong> CSV?
我尝试过这样的事情,但我没有得到格式正确的CSV ..
File.open('modified.csv', 'w') do |csv|
CSV.foreach('original.csv') do |row|
csv << row[0].to_s.gsub('=','').gsub(', The','')
csv << row[3].to_s.gsub(',','').to_i
csv << row[4].to_s.gsub(',','').to_i
end
end
我在执行块时也使用:headers => :integer
,但它不会让我将值从字符串转换为整数。那么,我错过了什么?我应该存储这些值然后写一个新的CSV还是有更简单的方法?
答案 0 :(得分:3)
Aaron,只需更改行并将其写入新文件,如此
require 'csv'
File.open('modified.csv', 'w') do |csv|
CSV.foreach('original.csv', :headers => true) do |row|
row['Estimated Weekend Gross'] = row['Estimated Weekend Gross'].delete(',').to_i
row['Cume'] = row['Cume'].delete(',').to_i
csv << row
end
end
编辑:如果你想在modified.csv中保存标题,你可以这样做,但如果有人有更好的解决方案,那么必须有一个较短的方法而不打开文件两次?
headers = CSV.open('original.csv', 'r', :headers => true).read.headers
CSV.open('modified.csv', 'w') do |csv|
csv << headers
CSV.foreach('original.csv', :headers => true) do |row|
row['Estimated Weekend Gross'] = row['Estimated Weekend Gross'].delete(',').to_i
row['Cume'] = row['Cume'].delete(',').to_i
csv << row
end
end
答案 1 :(得分:0)
你可以使用它来获取它:
sed 's/,\("[^"]*"\)*/|\1/g' file.csv | awk -F"|" '{s="";for (i=1; i<=NF; i++){if (i==4 || i==5){gsub("\,","",$i);gsub("\"","",$i);s=s","$i;}else{if (i>1){s=s","$i;}else{s=s""$i;}}}print s;}' -
我得到了这个输出:
"=""Zero Dark Thirty""","",4,24000000,29480807,2937,"8,172",""
"=""Haunted House, A""","",1,18817000,"18,817,000",2160,"8,712",""
"=""Gangster Squad""","",1,16710000,16710000,3103,"5,385",""
"=""Django Unchained""","",3,11065000,125399122,3012,"3,674",""
我知道这很难理解,所以我会一步一步解释:
首先,为每个字段设置一个分隔符,并考虑引号:
sed's /,(“[^”] “) / | \ 1 / g'file.csv
你会得到一个管道分隔符“|”每个领域之间:
"=""Zero Dark Thirty"""|""|4|"24,000,000"|"29,480,807"|2937|"8,172"|""
"=""Haunted House| A"""|""|1|"18,817,000"|"18,817,000"|2160|"8,712"|""
"=""Gangster Squad"""|""|1|"16,710,000"|"16,710,000"|3103|"5,385"|""
"=""Django Unchained"""|""|3|"11,065,000"|"125,399,122"|3012|"3,674"|""
一旦使用pipe作为字段分隔符获得此输出,您可以使用awk将所描述的过滤器应用于字段4和5(它应该在sed命令之后运行,因为它将sed的输出作为输入):
awk -F“|” '{s =“”; for(i = 1; i&lt; = NF; i ++){if(i == 4 || i == 5){gsub(“\,”,“”,$ i); gsub (“\”“,”“,$ i); s = s”,“$ i;} else {if(i&gt; 1){s = s”,“$ i;} else {s = s”“$ i;}}} print s;}' -
删除每个字段的引号和逗号(作为整数表示),并获得所需的输出:
"=""Zero Dark Thirty""","",4,24000000,29480807,2937,"8,172",""
"=""Haunted House, A""","",1,18817000,"18,817,000",2160,"8,712",""
"=""Gangster Squad""","",1,16710000,16710000,3103,"5,385",""
"=""Django Unchained""","",3,11065000,125399122,3012,"3,674",""
答案 2 :(得分:0)
你可以试试这个:
CSV.open('modified.csv', 'w') do |csv|
CSV.foreach('original.csv') do |row|
modified_row = row.clone
modified_row[0] = row[0].to_s.gsub('=','').gsub(', The','')
modified_row[3] = row[3].to_s.gsub(',','').to_i
modified_row[4] = row[4].to_s.gsub(',','').to_i
csv << modified_row
end
end
我更改了要写入的文件打开以使用CSV,然后更正了追加以附加行行数组而不是附加单个值。