如何从文件或列表中读取?

时间:2019-01-12 03:56:37

标签: python python-2.7

假设您有一段代码可以接受列表或文件名,并且必须通过应用相同的条件过滤提供的任何一项:

import argparse

parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group(required = True)
group.add_argument('-n', '--name', help = 'single name', action = 'append')
group.add_argument('-N', '--names', help = 'text file of names')
args = parser.parse_args()

results = []

if args.name:
    # We are dealing with a list.
    for name in args.name:
        name = name.strip().lower()
        if name not in results and len(name) > 6: results.append(name)

else:
    # We are dealing with a file name.
    with open(args.names) as f:
        for name in f:
            name = name.strip().lower()
            if name not in results and len(name) > 6: results.append(name)

我想在上面的代码中删除尽可能多的冗余。我尝试为striplower创建以下函数,但并没有删除太多重复代码:

def getFilteredName(name):
    return name.strip().lower()

有没有办法在同一函数中遍历列表和文件?我应该如何减少尽可能多的代码?

2 个答案:

答案 0 :(得分:1)

您有可以简化的重复代码:listfile-objects都是 iterables -如果创建的方法使用iterable并返回正确的输出可以减少代码重复(DRY)。

数据结构选择:

您不希望重复的项目,这意味着set()dict()更适合收集要解析的数据-通过设计,它们消除了重复的 查看某个项目是否已经in列表:

  • 如果名称顺序很重要
  • 如果名称顺序不重要,请使用dict

以上任一选择都会为您删除重复项。

set()

测试代码:

import argparse
from collections import OrderedDict # use normal dict on 3.7+ it hasinput order

def get_names(args):
    """Takes an iterable and returns a list of all unique lower cased elements, that
    have at least length 6."""

    seen = OrderedDict() # or dict or set

    def add_names(iterable):
        """Takes care of adding the stuff to your return collection."""
        k = [n.strip().lower() for n in iterable] # do the strip().split()ing only once
        # using generator comp to update - use .add() for set()
        seen.update( ((n,None) for n in k if len(n)>6))

    if args.name:
        # We are dealing with a list:
        add_names(args.name)

    elif args.names:
        # We are dealing with a file name:
        with open(args.names) as f:
            add_names(f)

    # return as list    
    return list(seen)

parser = argparse.ArgumentParser() group = parser.add_mutually_exclusive_group(required = True) group.add_argument('-n', '--name', help = 'single name', action = 'append') group.add_argument('-N', '--names', help = 'text file of names') args = parser.parse_args() results = get_names(args) print(results) 的输出:

-n Joh3333n -n Ji3333m -n joh3333n -n Bo3333b -n bo3333b -n jim

输入文件:

['joh3333n', 'ji3333m', 'bo3333b']

with open("names.txt","w") as names: for n in ["a"*k for k in range(1,10)]: names.write( f"{n}\n") 的输出:

-N names.txt

答案 1 :(得分:0)

子类list,并将子类设为context manager

class F(list):
    def __enter__(self):
        return self
    def __exit__(self,*args,**kwargs):
        pass

然后条件语句可以决定要迭代的内容

if args.name:
    # We are dealing with a list.
    thing = F(args.name)
else:
    # We are dealing with a file name.
    thing = open(args.names)

并且可以分解出迭代代码。

results = []

with thing as f:
    for name in f:
        name = name.strip().lower()
        if name not in results and len(name) > 6: results.append(name)

这是一种类似的解决方案,它从文件或列表中创建一个io.StringIO对象,然后使用单个指令集对其进行处理。

import io

if args.name:
    # We are dealing with a list.
    f = io.StringIO('\n'.join(args.name))
else:
    # We are dealing with a file name.
    with open(args.names) as fileobj:
        f = io.StringIO(fileobj.read())

results = []

for name in f:
    name = name.strip().lower()
    if name not in results and len(name) > 6: results.append(name)

如果文件很大且内存不足,则具有将整个文件读入内存的缺点。