修改函数函数的最Pythonic方法是什么?

时间:2012-02-06 21:16:58

标签: python

我有一个我正在使用的函数来读取特定格式的文件。我的功能看起来像这样:

import csv
from collections import namedtuple

def read_file(f, name, header=True):
    with open(f, mode="r") as infile:
        reader = csv.reader(infile, delimiter="\t")
        if header is True:
            next(reader)
        gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
        for row in reader:
            row = data(*row)
            yield row

我还有其他类型的文件,我想用这个函数读。但是,在使用read_file函数之前,其他文件类型需要一些轻微的解析步骤。例如,需要从列q条带化尾随句点,并且需要将字符atr附加到id列。显然,我可以创建一个新函数,或者向现有函数添加一些可选参数,但是有一种简单的方法可以修改这个函数,以便它可以用来读取其他文件类型吗?我在想装饰师的某些东西?

6 个答案:

答案 0 :(得分:4)

恕我直言,最恐怖的方式是将函数转换为基类,将文件操作拆分为方法,并根据基类在新类中重写这些方法。

答案 1 :(得分:3)

拥有一个采用文件名而不是打开文件的单片函数本身并不是非常Pythonic。您正在尝试在此处实现流处理器(file stream -> line stream -> CSV record stream -> [transformator ->] data stream),因此使用生成器实际上是个好主意。我稍微重构一下这个模块更加模块化了:

import csv
from collections import namedtuple

def csv_rows(infile, header):
    reader = csv.reader(infile, delimiter="\t")
    if header: next(reader)
    return reader

def data_sets(infile, header):
    gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
    for row in csv_rows(infile, header):
        yield gene_data(*row)

def read_file_type1(infile, header=True):
    # for this file type, we only need to pass the caller the raw 
    # data objects
    return data_sets(infile, header)

def read_file_type2(infile, header=True):
    # for this file type, we have to pre-process the data sets 
    # before yielding them. A good way to express this is using a
    # generator expression (we could also add a filtering condition here)
    return (transform_data_set(x) for x in data_sets(infile, header))

# Usage sample:
with open("...", "r") as f:
  for obj in read_file_type1(f):
    print obj

正如您所看到的,我们必须在函数链中一直传递header参数。这是一个强烈暗示,面向对象的方法在这里是合适的。我们在这里明显面对分层类型结构(基本数据文件,type1,type2)这一事实支持这一点。

答案 2 :(得分:1)

我建议您创建一些行迭代器,如下所示:

with MyFile('f') as f:
    for entry in f:
        foo(entry)

您可以通过为您自己的文件实现一个具有以下特征的类来实现此目的:

在它旁边,您可以创建一些函数open_my_file(filename)来确定文件类型并返回要使用的propriate文件对象。这可能只是一种企业方式,但如果您处理多种文件类型,则值得实现。

答案 3 :(得分:1)

面向对象的方式是:

class GeneDataReader:

    _GeneData = namedtuple('GeneData', 'id, name, q, start, end, sym')

    def __init__(self, filename, has_header=True):
        self._ignore_1st_row = has_header
        self._filename = filename        

    def __iter__():
        for row in self._tsv_by_row():
            yield self._GeneData(*self.preprocess_row(row))

    def _tsv_by_row(self):
        with open(self._filename, 'r') as f:
            reader = csv.reader(f, delimiter='\t')
            if self._ignore_1st_row: 
                next(reader)
            for row in reader:
                yield row 

    def preprocess_row(self, row):
        # does nothing.  override in derived classes
        return row

class SpecializedGeneDataReader(GeneDataReader):

    def preprocess_row(self, row):
        row[0] += 'atr'
        row[2] = row[2].rstrip('.')
        return row    

最简单的方法是使用额外的参数修改当前正在运行的代码。

def read_file(name, is_special=False, has_header=True):
    with open(name,'r') as infile:
        reader = csv.reader(infile, delimiter='\t')
        if has_header:
            next(reader)
        Data = namedtuple("Data", 'id, name, q, start, end, sym')
        for row in reader:
            if is_special:
                row[0] += 'atr'
                row[2] = row[2].rstrip('.')
            row = Data(*row)
            yield row

如果您正在寻找不那么嵌套但仍基于程序的东西:

def tsv_by_row(name, has_header=True):
    with open(f, 'r') as infile: # 
        reader = csv.reader(infile, delimiter='\t')
        if has_header: next(reader)
        for row in reader:
            yield row

def gene_data_from_vanilla_file(name, has_header=True):
    for row in tsv_by_row(name, has_header):
        yield gene_data(*row)

def gene_data_from_special_file(name, has_header=True):
    for row in tsv_by_row(name, has_header):
        row[0] += 'atr'
        row[2] = row[2].rstrip('.')
        yield GeneData(*row)

答案 4 :(得分:0)

如何将回调函数传递给read_file()

答案 5 :(得分:0)

本着Niklas B.的回答:

import csv, functools
from collections import namedtuple

def consumer(func):
    @functools.wraps(func)
    def start(*args, **kwargs):
        g = func(*args, **kwargs)
        g.next()
        return g
    return start

def csv_rows(infile, header, dest):
    reader = csv.reader(infile, delimter='\t')
    if header: next(reader)
    for line in reader:
        dest.send(line)

@consumer
def data_sets(dest):
    gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
    while 1:
        row = (yield)
        dest.send(gene_data(*row))

def read_file_1(fn, header=True):
    results, sink = getsink()
    csv_rows(fn, header, data_sets(sink))
    return results

def getsink():
    r = []
    @consumer
    def _sink():
        while 1:
            x = (yield)
            r.append(x)
    return (r, _sink())

@consumer
def transform_data_sets(dest):
    while True:
        data = (yield)
        dest.send(data[::-1]) # or whatever

def read_file_2(fn, header=True):
    results, sink = getsink()
    csv_rows(fn, header, data_sets(transform_data_sets(sink)))
    return results