我有一个文件 data.csv ,看起来像这样(两列; A和B):
A B
01 a
'b'
0101 a
b
010101 a
'b'
'c'
d
'e'
f
010102 a
b
'd'
'e'
010201 a
b
'c'
d
02 a
b
0201 a
b
020101 a
b
'd'
'e'
020102 a
'b'
c
020201 a
b
c
d
'e'
020301 a
'b'
c
d
我希望它看起来像这样(五列; A,B,C,D和E):
A B C D E
01 a b
0101 a b
010101 a b c d, e, f
010102 a b d, e
010201 a b c d
02 a
0201 a b
020101 a b d, e
020102 a b c
020201 a b c d, e
020301 a b c d
这就是我所知道的 data.csv :
foo
是其字符串的一部分将 data.csv 作为文本文件处理我将脚本放在一起:
代码:
#!/usr/bin/python3
f = open('data.csv')
c = f.read()
f.close()
c = c.replace('\n\n', '\n')
c = c.replace('\n\t', '\t')
c = c.replace("'", "")
f = open('output.csv', 'w')
f.write(c)
f.close()
......然后我卡住了。也许使用csv
模块可以更加统一地执行此操作以及其他调整。如何使用Python 3.3解决这个问题(我假设任何3.x解决方案都兼容)?
更新
基于Martijn Pieter的回答,我提出了这个问题,而似乎正在工作,虽然我不确定'a','b'和'c'文本值是总是放在适当的列中。此外,最后一行被跳过/留空。
#!/usr/bin/python3
import csv
with open('input.csv', newline='') as infile, open('output.csv', 'w', newline='') as outfile:
reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, delimiter='\t')
write_this_row = None
for row in reader:
# If there is a row with content...
if row:
# If the first cell has content...
if row[0]:
if write_this_row != None:
writer.writerow(write_this_row)
write_this_row = row
elif 'foo' in row[1]:
if len(write_this_row) < 5:
write_this_row.extend([''] * (5 - len(row)))
if write_this_row[4]:
write_this_row[4] += ';' + row[1]
else:
write_this_row[4] = row[1]
else:
write_this_row.insert(3, row[1])
答案 0 :(得分:2)
只需使用csv
模块读取数据,每行按一下,然后再将其写出来。
您可以使用None
或空字符串''
作为该列的值来创建“空”列。反之亦然,读取空列(因此在连续的标签之间)会为您提供空字符串。
with open('input.csv', newline='') as infile, open('output.csv', 'w', newline='') as outfile:
reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, delimiter='\t')
for row in reader:
if len(row) > 3:
# detect if `c` is missing (insert your own test here)
# sample test looks for 3 consecutive columns with values f, o and o
if row[3:6] == ['f', 'o', 'o']
# insert an empty `c`
row.insert(3, '')
if len(row) < 5:
# make row at least 5 columns long
row.extend([''] * (5 - len(row)))
if len(row) > 5:
# merge any excess columns into the 5th column
row[4] = ','.join(row[4:])
del row[5:]
writer.writerow(row)
<强>更新强>
不使用标志,而是使用阅读器作为迭代器(在其上调用next()
以获取下一行而不是使用for
循环):
with open('input.csv', newline='') as infile, open('output.csv', 'w', newline='') as outfile:
reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, delimiter='\t')
row = None
try:
next(reader) # skip the `A B` headers.
line = next(reader) # prime our loop
while True:
while not line[0]:
# advance to the first line with a column 0 value
line = next(reader)
row = line # start off with the first number and column
line = next(reader) # prime the subsequent lines loop
while line and not line[0]:
# process subsequent lines until we find one with a value in col 0 again
cell = line[1]
if cell == 'foo': # detect column d
row.append('') # and insert empty value
row.append(cell)
line = next(reader)
# consolidate, write
if len(row) < 5:
# make row at least 5 columns long
row.extend([''] * (5 - len(row)))
if len(row) > 5:
# merge any excess columns into the 5th column
row[4] = ','.join(row[4:])
del row[5:]
writer.writerow(row)
row = None
except StopIteration:
# reader is done, no more lines to come
# process the last row if there was one
if row is not None:
# consolidate, write
if len(row) < 5:
# make row at least 5 columns long
row.extend([''] * (5 - len(row)))
if len(row) > 5:
# merge any excess columns into the 5th column
row[4] = ','.join(row[4:])
del row[5:]
writer.writerow(row)