我是使用CSV模块进行数据处理的新手。我输入文件并使用此代码
import csv
path1 = "C:\\Users\\apple\\Downloads\\Challenge\\raw\\charity.a.data"
csv_file_path = "C:\\Users\\apple\\Downloads\\Challenge\\raw\\output.csv.bak"
with open(path1, 'r') as in_file:
in_file.__next__()
stripped = (line.strip() for line in in_file)
lines = (line.split(":$%:") for line in stripped if line)
with open(csv_file_path, 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('id', 'donor_id','last_name','first_name','year','city','state','postal_code','gift_amount'))
writer.writerows(lines)
答案 0 :(得分:1)
如果您只想消除':'在第一列和最后一列,这应该工作。请记住,在您阅读之前,您的数据集应该是tab
(或逗号以外的其他内容),因为正如我在您的问题中所评论的那样,有逗号','在您的数据集中。
path1 = '/path/input.csv'
path2 = '/path/output.csv'
with open(path1, 'r') as input, open(path2, 'w') as output:
file = iter(input.readlines())
output.write(next(file))
for row in file:
output.write(row[1:][:-2] + '\n')
<强>更新强>
因此,在给出您的代码后,我添加了一个小的更改来从初始文件开始执行整个过程。这个想法是一样的。您应该只排除每行的第一个和最后一个字符。因此,您应该line.strip()
而不是line.strip()[1:][:-2]
。
import csv
path1 = "C:\\Users\\apple\\Downloads\\Challenge\\raw\\charity.a.data"
csv_file_path = "C:\\Users\\apple\\Downloads\\Challenge\\raw\\output.csv.bak"
with open(path1, 'r') as in_file:
in_file.__next__()
stripped = (line.strip()[1:][:-2] for line in in_file)
lines = (line.split(":$%:") for line in stripped if line)
with open(csv_file_path, 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('id', 'donor_id','last_name','first_name','year','city','state','postal_code','gift_amount'))
writer.writerows(lines)