我有一个以逗号分隔的文件。线条看起来像这样......
1,2,3,4,5
6,7,8
9,10
11,12,13,14,15
我需要在所有行中准确显示5列。所以新文件将是......
1,2,3,4,5
6,7,8,,
9,10,,,
11,12,13,14,15
换句话说,如果一行中的逗号少于4个。添加所需的数字到最后。我被告知有python模块将完全相同。我在哪里可以找到这样的模块? awk更适合这类任务吗?
答案 0 :(得分:2)
您正在寻找的模块是csv
module。您仍然需要确保列表符合最小长度要求:
with open('output.csv', 'wb') as output:
input = csv.reader(open('faultyfile.csv', 'rb'))
output = csv.writer(output, dialect=input.dialect)
for line in input:
if len(line) < 5:
line.extend([''] * (5 - len(line)))
output.writerow(line)
答案 1 :(得分:2)
如果您不介意使用awk,那么很容易:
$ cat data.txt
1,2,3,4,5
6,7,8
9,10
11,12,13,14,15
$ awk -F, 'BEGIN {OFS=","} {print $1,$2,$3,$4,$5}' data.txt
1,2,3,4,5
6,7,8,,
9,10,,,
11,12,13,14,15
答案 2 :(得分:1)
def correct_file(fname):
with open(fname) as f:
data = [ line[:-1]+(4-line.count(','))*',' + '\n' for line in f ]
with open(fname,'w'):
f.writelines(data)
如评论中所述,当你真的不需要时,这会将整个文件读入内存。要做到这一点并非一气呵成:
import shutil
def correct_file(fname):
with open(fname,'r') as fin, open('temp','w') as fout:
for line in fin:
new = line[:-1]+(4-line.count(','))*',' + '\n'
fout.write(new)
shutil.move('temp',fname)
这将使名为temp
的任何文件在当前目录中消失。当然,您始终可以使用tempfile
模块来解决这个问题......
对于稍微更详细,但防弹(?)版本:
import shutil
import tempfile
import atexit
import os
def try_delete(fname):
try:
os.unlink(fname)
except OSError:
if os.path.exists(fname):
print "Couldn't delete existing file",fname
def correct_file(fname):
with open(fname,'r') as fin, tempfile.NamedTemporaryFile('w',delete=False) as fout:
atexit.register(lambda f=fout.name: try_delete(f)) #Need a closure here ...
for line in fin:
new = line[:-1]+(4-line.count(','))*',' + '\n'
fout.write(new)
shutil.move(fout.name,fname) #This should get rid of the temporary file ...
答案 3 :(得分:1)
with open('somefile.txt') as f:
rows = []
for line in f:
rows.append(line.split(","))
max_cols = len(max(rows,key=len))
for row in rows:
row.extend(['']*(max_cols-len(row))
print "\n".join(str(r) for r in rows)
如果您确定它总是n个项目(在这种情况下为5)并且您在打开文件之前总是知道...它可以提高内存效率(类似这样)
with open("f1","r"):
with open("f2","w"):
for line in f1:
f2.write(line+(","*(4-line.count(",")))+"\n")
答案 4 :(得分:0)
这可能适合你(GNU sed):
sed ':a;s/,/&/4;t;s/$/,/;ta' file