我有一个CSV文件,我只想更改某些列的标题(在我的实际文件中大约有20个)。这是一个示例CSV文件:
CSV文件
"name","blah_01_blah","foo_1_01_foo","bacon_01_bacon","bacon_02_bacon"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
我一直在尝试使用File / IO类,但是当它读取要执行gsub
的文件时,它会删除用逗号分隔的每个字符串周围的所有引号。这是我正在使用的代码:
Ruby代码
file = 'file.csv'
replacements = {
'blah_01_blah' => 'newblah1',
'foo_01_foo' => 'coolfoo1',
'bacon_01_bacon' => 'goodpig1',
'bacon_01_bacon' => 'goodpig2'
}
matcher = /#{replacements.keys.join('|')}/
outdata = File.read(file).gsub(matcher, replacements)
File.open(file, 'w') do |out|
out << outdata
end
我最终得到的是CSV文件:
新CSV文件
name,blah_01_blah,foo_1_01_foo,bacon_01_bacon,bacon_02_bacon
John,yucky,summer,yum,food
Mary,"","",cool,sundae
它将引号保留在空白的字段中,但将其带到其他地方的字符串周围。我想保留那些引号,以防出于某种原因,某个流氓逗号最终会在一个字符串中结束,所以它不会被抛弃。如何在不丢失字符串周围的引号的情况下更改标题?
编辑 - 这就是我希望文件在最后的样子。
预期结果CSV文件
"name","newblah1","coolfoo1","goodpig1","goodpig2"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
谢谢!
答案 0 :(得分:1)
您根本不需要处理CSV:
term = input("Enter the term you are looking for:")
term = term.lower()
vcounts = df.Terms.str.lower().value_counts()
try:
print(term+':', vcounts[term])
except KeyError:
print('Sorry!', term, 'is not found in the database')
#Enter the term you are looking for: the
#the: 2
#Enter the term you are looking for: The
#the: 2
#Enter the term you are looking for: Oranges
#Sorry! oranges is not found in the database
这里的诀窍是我们实际上只处理第一行,就像纯文本一样。
答案 1 :(得分:0)
让我们先创建输入CSV文件。
text =<<_
"name","blah_01_blah","foo_1_01_foo","bacon_01_bacon","bacon_02_bacon"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
_
file_in = 'file_in.csv'
file_out = 'file_out.csv'
File.write(file_in, text)
#=> 137
这是replacements
哈希,我稍微简化了一下。
replacements = {'blah_01_blah'=>'newblah1', 'foo_01_foo'=>'coolfoo1',
'bacon_01_bacon'=>'goodpig1'}
第一项任务是修改此哈希,这样如果它没有密钥k
,replacements[k]
将返回k
。为此,我们使用方法Hash#default_proc=。
replacements.default_proc = ->(_,k) { k }
以下是如何使用此哈希的两个示例。
replacements['bacon_01_bacon']
#=> "goodpig1"
replacements['name']
#=> "name"`
后者是因为replacements
没有键'name'
。
代码如下。
require 'csv'
f_in = CSV.read(file_in, headers:true)
CSV.open(file_out, 'w') do |csv_out|
csv_out << replacements.values_at(*f_in.headers)
f_in.each { |row| csv_out << row }
end
#=> #<CSV::Table mode:col_or_row row_count:3>
请注意
f_in.headers
#=> ["name", "blah_01_blah", "foo_1_01_foo", "bacon_01_bacon", "bacon_02_bacon"]
让我们看一下输出文件。
puts File.read(file_out)
打印
name,newblah1,foo_1_01_foo,goodpig1,bacon_02_bacon
John,yucky,summer,yum,food
Mary,"","",cool,sundae