Question

我已经看到计算文件中行数的快速方法是这样做：

mail($reserve_to , $reserver_subject, $formcontent)

我想知道是否可以在sum函数中加入一些条件以便得到类似的东西：

n_lines=sum(1 for line in open(myfile))

提前谢谢。

Answer 1

您可以，但有某些限制。您将生成器表达式作为参数传递给sum，并且生成器表达式可以使用带有if子句的一个表达式。您可以结合以下条件：

n_lines=sum(1 for line in open(PATHDIFF)
                if line != '\n' and not line.startswith('#'))

但是，当您点击newline时，这不会使文件的迭代发生短路;它继续通读文件到最后。为避免这种情况，您可以使用itertools.takewhile，它只读取生成器表达式生成的迭代器，直到您读取换行符。

from itertools import takewhile
n_lines = sum(1 for line in takewhile(lambda x: x != '\n',
                                      open(PATHDIFF))
                   if not line.startswith('#'))

您还可以使用itertools.ifilterfalse填充与生成器表达式的条件子句相同的角色。

from itertools import takewhile, ifilterfalse
n_lines = sum(1 for line in ifilterfalse(lambda x: x.startswith('#'),
                                         takewhile(lambda x: x != '\n',
                                                   open(PATHDIFF))))

当然，现在你的代码开始看起来像是用Scheme或Lisp编写的。生成器表达式更容易阅读，但itertool模块对于构建修改后的迭代器很有用作为不同的对象传播。

在另一个主题上，您应该始终确保关闭所打开的任何文件，这意味着不要在迭代器中使用匿名文件句柄。最简单的方法是使用with语句：

with open(PATHDIFF) as f:
    n_lines = sum(1 for line in f if line != '\n' and not line.startswith('#'))

其他例子可以类似地修改;只需将open(PATHDIFF)替换为出现的f。

Answer 2

实际上有一种快速的方式（从Funcy借用）来计算迭代器的长度而不消耗它：

示例：

from collections import deque from itertools import count, izip def ilen(seq): counter = count() deque(izip(seq, counter), maxlen=0) # (consume at C speed) return next(counter) def lines(filename) with open(filename, 'r') as f: return ilen( None for line in f if line != "\n" and not line.startswith("#") ) nlines = lines("file.txt")

Answer 3

您无法在列表推导或生成器表达式中使用break或continue，因此＆＃34;更正＆＃34;您的示例的语法是：

nlines = 0
with  open(PATHDIFF) as f:
    for line in f:
        if line=='\n':
            # not sure that's _really_ what you want
            # => this will exit the loop at the first 'empty' line
            break 
        if line.startswith('#'):
            continue
        nlines += 1

现在，如果你真的想退出第一个空的＆＃39;线和想要使它成为单线，你也可以使用itertools.takewhile()：

from itertools import takewhile
with open(XXX) as f: 
    nlines = sum(1 for line in takewhile(lambda l: l != '\n', f) 
                 if not line.starstwith("#"))

Answer 4

from itertools import ifilter,takewhile
with open("test.txt") as f:
     fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda line: not line.startswith("#"), f)))
     print(fil)

或者索引编制速度可能比startswith调用快：

 fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda x: x[0] != "#", f)))

使用str.strip将捕获任何空行。

索引确实有点快：

In [11]: from itertools import ifilter,takewhile

In [12]: %%timeit
   ....: with open("test.txt") as f:
   ....:      fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda x: x[0] != "#", f)))
   ....: 

1000 loops, best of 3: 400 µs per loop

In [13]: %%timeit
   ....: with open("test.txt") as f:
   ....:      fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda line: not line.startswith("#"), f)))
   ....: 

1000 loops, best of 3: 531 µs per loop

Answer 5

如果你想要速度并且不介意使用bash

grep -v '^#' yourfile | wc -l

将计算所有不以＃开头的行，它将比python更快。

Answer 6

您是否想要评论行数或不评论？如果是这样的，那么这应该有效。

comment_lines = sum([1 for line in open(PATHDIFF) if line.startswith('#')])
non_comment_lines = sum([1 for line in open(PATHDIFF) if not line.startswith('#')])

优化的方式来计算条件的行数

6 个答案: