转储包含一系列空白字段的CSV文件的行

时间:2016-01-25 11:06:40

标签: python

我正在尝试编写一个python程序来清理来自CSV文件的调查数据。 我想转储包含一系列空白字段的行,如下例中的第一行和第三行。

"1","a","b","c",,,,,
"2","a","b","c","d","e","f",,"h"
"3","a","b","c",,,,,
"4","a","z","u","d","i","f","x","h"
"5","d","c","c",,"c","f","g","z"

我的代码不成功:

import csv

fname = raw_input("Enter input file name: ")
if len(fname) < 1 : fname = "survey.csv"

foutput = raw_input("Enter output file name: ")
if len(foutput) < 1 : foutput = "output_"+fname


input = open(fname, 'rb')
output = open(foutput, 'wb')


searchFor = 5*['']

writer = csv.writer(output)

for row in csv.reader(input):
    if searchFor not in row :
        writer.writerow(row)

input.close()
output.close()

2 个答案:

答案 0 :(得分:1)

使用counter检查一个列表是否是另一个列表的子集,如下所示。如果您要删除空元素,请使用Noneboollen过滤空白并弃掉它们 -

import csv
from itertools import repeat
from collections import Counter
input = open(fname, 'rb')
output = open(foutput, 'wb')

writer = csv.writer(output)
#Helper function
def counterSubset(list1, list2):
    c1, c2 = Counter(list1), Counter(list2)
    for k, n in c1.items():
        if n > c2[k]:
            return False
    return True
for row in csv.reader(input):
    if not counterSubset(list(repeat('',5)),row):# i used 5 for five '' you can change it
        writer.writerow(row)#use filter(None,row) or filter(bool,row) or filter(len,row) to remove empty elements
input.close()
output.close()

输出 -

1,a,b,c,,
2,a,b,c,d,e,f,g,h
4,a,,z,u,d,i,f,x,h
5,d,c,c,d,c,f,g,z

答案 1 :(得分:0)

怎么样

# change this to whatever a blank item is from the csv reader
# probably "" or None
blank_item = None

for row in csv.reader(input):
    # filter out all blank elements
    blanks = [x for x in row if x == blank_item]
    if len(blanks) < 5:
        writer.writerow(row)

这将计算一行中的空白数量,并允许您根据需要删除它们。