我有一个我正在使用的函数来读取特定格式的文件。我的功能看起来像这样:
import csv
from collections import namedtuple
def read_file(f, name, header=True):
with open(f, mode="r") as infile:
reader = csv.reader(infile, delimiter="\t")
if header is True:
next(reader)
gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
for row in reader:
row = data(*row)
yield row
我还有其他类型的文件,我想用这个函数读。但是,在使用read_file
函数之前,其他文件类型需要一些轻微的解析步骤。例如,需要从列q
条带化尾随句点,并且需要将字符atr
附加到id
列。显然,我可以创建一个新函数,或者向现有函数添加一些可选参数,但是有一种简单的方法可以修改这个函数,以便它可以用来读取其他文件类型吗?我在想装饰师的某些东西?
答案 0 :(得分:4)
恕我直言,最恐怖的方式是将函数转换为基类,将文件操作拆分为方法,并根据基类在新类中重写这些方法。
答案 1 :(得分:3)
拥有一个采用文件名而不是打开文件的单片函数本身并不是非常Pythonic。您正在尝试在此处实现流处理器(file stream -> line stream -> CSV record stream -> [transformator ->] data stream
),因此使用生成器实际上是个好主意。我稍微重构一下这个模块更加模块化了:
import csv
from collections import namedtuple
def csv_rows(infile, header):
reader = csv.reader(infile, delimiter="\t")
if header: next(reader)
return reader
def data_sets(infile, header):
gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
for row in csv_rows(infile, header):
yield gene_data(*row)
def read_file_type1(infile, header=True):
# for this file type, we only need to pass the caller the raw
# data objects
return data_sets(infile, header)
def read_file_type2(infile, header=True):
# for this file type, we have to pre-process the data sets
# before yielding them. A good way to express this is using a
# generator expression (we could also add a filtering condition here)
return (transform_data_set(x) for x in data_sets(infile, header))
# Usage sample:
with open("...", "r") as f:
for obj in read_file_type1(f):
print obj
正如您所看到的,我们必须在函数链中一直传递header
参数。这是一个强烈暗示,面向对象的方法在这里是合适的。我们在这里明显面对分层类型结构(基本数据文件,type1,type2)这一事实支持这一点。
答案 2 :(得分:1)
我建议您创建一些行迭代器,如下所示:
with MyFile('f') as f:
for entry in f:
foo(entry)
您可以通过为您自己的文件实现一个具有以下特征的类来实现此目的:
在它旁边,您可以创建一些函数open_my_file(filename)
来确定文件类型并返回要使用的propriate文件对象。这可能只是一种企业方式,但如果您处理多种文件类型,则值得实现。
答案 3 :(得分:1)
面向对象的方式是:
class GeneDataReader:
_GeneData = namedtuple('GeneData', 'id, name, q, start, end, sym')
def __init__(self, filename, has_header=True):
self._ignore_1st_row = has_header
self._filename = filename
def __iter__():
for row in self._tsv_by_row():
yield self._GeneData(*self.preprocess_row(row))
def _tsv_by_row(self):
with open(self._filename, 'r') as f:
reader = csv.reader(f, delimiter='\t')
if self._ignore_1st_row:
next(reader)
for row in reader:
yield row
def preprocess_row(self, row):
# does nothing. override in derived classes
return row
class SpecializedGeneDataReader(GeneDataReader):
def preprocess_row(self, row):
row[0] += 'atr'
row[2] = row[2].rstrip('.')
return row
最简单的方法是使用额外的参数修改当前正在运行的代码。
def read_file(name, is_special=False, has_header=True):
with open(name,'r') as infile:
reader = csv.reader(infile, delimiter='\t')
if has_header:
next(reader)
Data = namedtuple("Data", 'id, name, q, start, end, sym')
for row in reader:
if is_special:
row[0] += 'atr'
row[2] = row[2].rstrip('.')
row = Data(*row)
yield row
如果您正在寻找不那么嵌套但仍基于程序的东西:
def tsv_by_row(name, has_header=True):
with open(f, 'r') as infile: #
reader = csv.reader(infile, delimiter='\t')
if has_header: next(reader)
for row in reader:
yield row
def gene_data_from_vanilla_file(name, has_header=True):
for row in tsv_by_row(name, has_header):
yield gene_data(*row)
def gene_data_from_special_file(name, has_header=True):
for row in tsv_by_row(name, has_header):
row[0] += 'atr'
row[2] = row[2].rstrip('.')
yield GeneData(*row)
答案 4 :(得分:0)
如何将回调函数传递给read_file()
答案 5 :(得分:0)
本着Niklas B.的回答:
import csv, functools
from collections import namedtuple
def consumer(func):
@functools.wraps(func)
def start(*args, **kwargs):
g = func(*args, **kwargs)
g.next()
return g
return start
def csv_rows(infile, header, dest):
reader = csv.reader(infile, delimter='\t')
if header: next(reader)
for line in reader:
dest.send(line)
@consumer
def data_sets(dest):
gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
while 1:
row = (yield)
dest.send(gene_data(*row))
def read_file_1(fn, header=True):
results, sink = getsink()
csv_rows(fn, header, data_sets(sink))
return results
def getsink():
r = []
@consumer
def _sink():
while 1:
x = (yield)
r.append(x)
return (r, _sink())
@consumer
def transform_data_sets(dest):
while True:
data = (yield)
dest.send(data[::-1]) # or whatever
def read_file_2(fn, header=True):
results, sink = getsink()
csv_rows(fn, header, data_sets(transform_data_sets(sink)))
return results