我有一个如下所示的CSV文件:
"Company, Inc.",,,,,,,,,,,,10/30/09
A/R Summary Aged Analysis Report,,,,,,,,,,,,10:35:01
All Clients,,,,,,,,,,,,USER
Client Account,Customer Name,15-Jan,16 - 30,31 - 60,61 - 90,91 - 120,120 - Over,Total,Status,Credit Limit
1000001111,CLIENT A,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00"
1000002222,CLIENT B,0,0,0,"3,591.27",0,0,"3,591.27",COD,0
1000003333,CLIENT C,536.78,0,0,0,0,"11,216.60","11,753.38",COD,0
1000004444,CLIENT D,0,514.94,"3,147.45",690,0,0,"4,352.39",COD,0
Grand Total,,"139,203,856.06","84,607,749.30","110,746,640.18","58,474,379.45","52,025,869.06","292,653,734.82","737,712,228.87",,,,
但我只想处理“客户帐户...”行之后和“总计...”之前的行。这是我现在使用的代码:
inputFile = csv.reader(open(filename), dialect='excel')
records = [line for line in inputFile if line and line[0].isdigit()]
答案 0 :(得分:12)
通过发电机。您可以通过简单的生成器过滤器功能构建各种复杂性。虽然比过滤器复杂得多,但它更易于扩展,并且可以轻松处理非常复杂的电子表格。
def skip_blank( rdr ):
for row in rdr:
if len(row) == 0: continue
if all(len(col)==0 for col in row): continue
yield row
def after_heading( text, rdr ):
i= iter(rdr)
for row in i:
if any( column == text for column in row ):
break
for row in i:
yield row
def before_footing( text, rdr ):
for row in rdr:
if any( column == text for column in row ):
break
yield row
def between( start, end, rdr ):
for row in before_footing( end, after_heading( start, rdr ) ):
yield row
for row in between( 'Grand Total', 'Client Account', skip_blank( inputFile ) ):
print row
答案 1 :(得分:10)
你可以通过设置标志
这样做import csv
file = "file"
f=0
reader = csv.reader(open(file),delimiter=',')
for row in reader:
if "Grand Total" in row: break
if "Client Account" in row: f=1;continue
if f:
if row[0].isdigit():
print row
答案 2 :(得分:6)
import re
import StringIO
data=re.search("Client Account[^\r\n]+[\r\n]+(.*)(?=Grand Total)",open(filename).read(),re.DOTALL).group(1)
datafile=StringIO.StringIO(data)
inputFile = csv.reader(datafile, dialect='excel')
records = [line for line in inputFile if line and line[0].isdigit()]
答案 3 :(得分:3)
使用一个漂亮的小型发电机来做这样的事情。如果您的要求发生变化,可以将这个概括一点:
def lines_between(source, first, second):
for line in source:
if line and line[0] == first:
break
for line in source:
if line: and line[0] == second:
break
if line: # only non-empty lines
yield line
for record in lines_between(inputFile, 'Client Account', 'Grand Total'):
# process record
你没有明确要求“非空行”过滤器,但是你自己的方法是这样做的,所以我假设你想要它。如果您不想像这样“懒洋洋地”处理行,但只想要一个包含事先构建的所有内容的列表,请执行以下操作:
records = list(lines_between(inputFile, 'Client Account', 'Grand Total'))
顺便说一句,在Windows上,请务必使用二进制模式打开真实的源文件,csv.reader(open(filename, 'rb'), dialect='excel')
作为csv docs note。