Python Csv Parser只对文件的一部分进行排序

时间:2012-12-07 00:30:12

标签: python parsing csv

  

可能重复:
  Filtering a CSV file in python

我有一个用python编写的CSV解析器,它在第500行开始失败。它开始无法解析^(\w+)\*(\d\d):(\d\d):(\d\d)$正则表达式

import csv
import re
import sys

csvdictreader = csv.DictReader(open('mhc.csv','r+b'), delimiter=',')
csvdictwriter = csv.DictWriter(file('mhc_fixed.csv','w+b'), fieldnames=csvdictreader.fieldnames, delimiter=',')
csvdictwriter.writeheader()

targets = [name for name in csvdictreader.fieldnames if name.startswith('HLA-D')]

for rowfields in csvdictreader:
    keep = True

for field in targets:
    value = rowfields[field]

if re.match(r'^\w+\*\d\d$', value): # gene resolution too low?
  keep = False
  break # quit processing target fields

else: # reduce gene resolution if too high
    # by only keeping first two alles if three are present
    if (re.match(r'^(\w+)\*(\d\d):(\d\d):(\d\d)$')): rowfields[field] = re.sub(r'^(\w+)\*(\d\d):(\d\d):(\d\d)$',r'\1*\2:\3', value)

    if (re.match(r'^(\w+)\*(\d+):(\d+):(\d+):(\d+):(\d+)$')): rowfields[field] = re.sub(r'^(\w+)\*(\d+):(\d+):(\d+):(\d+)$',r'\1*\2:\3', value)

if keep:
    csvdictwriter.writerow(rowfields)

if rowfields > 1400:
    print >>sys.stderr

0 个答案:

没有答案