我正在尝试编写一个python程序来清理来自CSV文件的调查数据。 我想转储包含一系列空白字段的行,如下例中的第一行和第三行。
"1","a","b","c",,,,,
"2","a","b","c","d","e","f",,"h"
"3","a","b","c",,,,,
"4","a","z","u","d","i","f","x","h"
"5","d","c","c",,"c","f","g","z"
我的代码不成功:
import csv
fname = raw_input("Enter input file name: ")
if len(fname) < 1 : fname = "survey.csv"
foutput = raw_input("Enter output file name: ")
if len(foutput) < 1 : foutput = "output_"+fname
input = open(fname, 'rb')
output = open(foutput, 'wb')
searchFor = 5*['']
writer = csv.writer(output)
for row in csv.reader(input):
if searchFor not in row :
writer.writerow(row)
input.close()
output.close()
答案 0 :(得分:1)
使用counter
检查一个列表是否是另一个列表的子集,如下所示。如果您要删除空元素,请使用None
,bool
或len
过滤空白并弃掉它们 -
import csv
from itertools import repeat
from collections import Counter
input = open(fname, 'rb')
output = open(foutput, 'wb')
writer = csv.writer(output)
#Helper function
def counterSubset(list1, list2):
c1, c2 = Counter(list1), Counter(list2)
for k, n in c1.items():
if n > c2[k]:
return False
return True
for row in csv.reader(input):
if not counterSubset(list(repeat('',5)),row):# i used 5 for five '' you can change it
writer.writerow(row)#use filter(None,row) or filter(bool,row) or filter(len,row) to remove empty elements
input.close()
output.close()
输出 -
1,a,b,c,,
2,a,b,c,d,e,f,g,h
4,a,,z,u,d,i,f,x,h
5,d,c,c,d,c,f,g,z
答案 1 :(得分:0)
怎么样
# change this to whatever a blank item is from the csv reader
# probably "" or None
blank_item = None
for row in csv.reader(input):
# filter out all blank elements
blanks = [x for x in row if x == blank_item]
if len(blanks) < 5:
writer.writerow(row)
这将计算一行中的空白数量,并允许您根据需要删除它们。