优化的方式来计算条件的行数

时间:2015-05-27 13:41:59

标签: python

我已经看到计算文件中行数的快速方法是这样做:

mail($reserve_to , $reserver_subject, $formcontent)

我想知道是否可以在sum函数中加入一些条件以便得到类似的东西:

n_lines=sum(1 for line in open(myfile))

提前谢谢。

6 个答案:

答案 0 :(得分:5)

您可以,但有某些限制。您将生成器表达式作为参数传递给sum,并且生成器表达式可以使用带有if子句的一个表达式。您可以结合以下条件:

n_lines=sum(1 for line in open(PATHDIFF)
                if line != '\n' and not line.startswith('#'))

但是,当您点击newline时,这不会使文件的迭代发生短路;它继续通读文件到最后。为避免这种情况,您可以使用itertools.takewhile,它只读取生成器表达式生成的迭代器,直到您读取换行符。

from itertools import takewhile
n_lines = sum(1 for line in takewhile(lambda x: x != '\n',
                                      open(PATHDIFF))
                   if not line.startswith('#'))

您还可以使用itertools.ifilterfalse填充与生成器表达式的条件子句相同的角色。

from itertools import takewhile, ifilterfalse
n_lines = sum(1 for line in ifilterfalse(lambda x: x.startswith('#'),
                                         takewhile(lambda x: x != '\n',
                                                   open(PATHDIFF))))

当然,现在你的代码开始看起来像是用Scheme或Lisp编写的。生成器表达式更容易阅读,但itertool模块对于构建修改后的迭代器很有用 作为不同的对象传播。

在另一个主题上,您应该始终确保关闭所打开的任何文件,这意味着不要在迭代器中使用匿名文件句柄。最简单的方法是使用with语句:

with open(PATHDIFF) as f:
    n_lines = sum(1 for line in f if line != '\n' and not line.startswith('#'))

其他例子可以类似地修改;只需将open(PATHDIFF)替换为出现的f

答案 1 :(得分:2)

实际上有一种快速的方式(Funcy借用)来计算迭代器的长度而不消耗它:

示例:

from collections import deque
from itertools import count, izip


def ilen(seq):
    counter = count()
    deque(izip(seq, counter), maxlen=0)  # (consume at C speed)
    return next(counter)


def lines(filename)
    with open(filename, 'r') as f:
        return ilen(
            None for line in f
            if line != "\n" and not line.startswith("#")
        )


nlines = lines("file.txt")

答案 2 :(得分:2)

您无法在列表推导或生成器表达式中使用breakcontinue,因此"更正"您的示例的语法是:

nlines = 0
with  open(PATHDIFF) as f:
    for line in f:
        if line=='\n':
            # not sure that's _really_ what you want
            # => this will exit the loop at the first 'empty' line
            break 
        if line.startswith('#'):
            continue
        nlines += 1

现在,如果你真的想退出第一个空的'线和想要使它成为单线,你也可以使用itertools.takewhile()

from itertools import takewhile
with open(XXX) as f: 
    nlines = sum(1 for line in takewhile(lambda l: l != '\n', f) 
                 if not line.starstwith("#"))

答案 3 :(得分:2)

from itertools import ifilter,takewhile
with open("test.txt") as f:
     fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda line: not line.startswith("#"), f)))
     print(fil)

或者索引编制速度可能比startswith调用快:

 fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda x: x[0] != "#", f)))

使用str.strip将捕获任何空行。

索引确实有点快:

In [11]: from itertools import ifilter,takewhile

In [12]: %%timeit
   ....: with open("test.txt") as f:
   ....:      fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda x: x[0] != "#", f)))
   ....: 

1000 loops, best of 3: 400 µs per loop

In [13]: %%timeit
   ....: with open("test.txt") as f:
   ....:      fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda line: not line.startswith("#"), f)))
   ....: 

1000 loops, best of 3: 531 µs per loop

答案 4 :(得分:1)

如果你想要速度并且不介意使用bash

grep -v '^#' yourfile | wc -l

将计算所有不以#开头的行,它将比python更快。

答案 5 :(得分:0)

您是否想要评论行数或不评论? 如果是这样的,那么这应该有效。

comment_lines = sum([1 for line in open(PATHDIFF) if line.startswith('#')])
non_comment_lines = sum([1 for line in open(PATHDIFF) if not line.startswith('#')])