我已经看到计算文件中行数的快速方法是这样做:
mail($reserve_to , $reserver_subject, $formcontent)
我想知道是否可以在sum函数中加入一些条件以便得到类似的东西:
n_lines=sum(1 for line in open(myfile))
提前谢谢。
答案 0 :(得分:5)
您可以,但有某些限制。您将生成器表达式作为参数传递给sum
,并且生成器表达式可以使用带有if
子句的一个表达式。您可以结合以下条件:
n_lines=sum(1 for line in open(PATHDIFF)
if line != '\n' and not line.startswith('#'))
但是,当您点击newline
时,这不会使文件的迭代发生短路;它继续通读文件到最后。为避免这种情况,您可以使用itertools.takewhile
,它只读取生成器表达式生成的迭代器,直到您读取换行符。
from itertools import takewhile
n_lines = sum(1 for line in takewhile(lambda x: x != '\n',
open(PATHDIFF))
if not line.startswith('#'))
您还可以使用itertools.ifilterfalse
填充与生成器表达式的条件子句相同的角色。
from itertools import takewhile, ifilterfalse
n_lines = sum(1 for line in ifilterfalse(lambda x: x.startswith('#'),
takewhile(lambda x: x != '\n',
open(PATHDIFF))))
当然,现在你的代码开始看起来像是用Scheme或Lisp编写的。生成器表达式更容易阅读,但itertool
模块对于构建修改后的迭代器很有用
作为不同的对象传播。
在另一个主题上,您应该始终确保关闭所打开的任何文件,这意味着不要在迭代器中使用匿名文件句柄。最简单的方法是使用with
语句:
with open(PATHDIFF) as f:
n_lines = sum(1 for line in f if line != '\n' and not line.startswith('#'))
其他例子可以类似地修改;只需将open(PATHDIFF)
替换为出现的f
。
答案 1 :(得分:2)
实际上有一种快速的方式(从Funcy借用)来计算迭代器的长度而不消耗它:
示例:强>
from collections import deque
from itertools import count, izip
def ilen(seq):
counter = count()
deque(izip(seq, counter), maxlen=0) # (consume at C speed)
return next(counter)
def lines(filename)
with open(filename, 'r') as f:
return ilen(
None for line in f
if line != "\n" and not line.startswith("#")
)
nlines = lines("file.txt")
答案 2 :(得分:2)
您无法在列表推导或生成器表达式中使用break
或continue
,因此"更正"您的示例的语法是:
nlines = 0
with open(PATHDIFF) as f:
for line in f:
if line=='\n':
# not sure that's _really_ what you want
# => this will exit the loop at the first 'empty' line
break
if line.startswith('#'):
continue
nlines += 1
现在,如果你真的想退出第一个空的'线和想要使它成为单线,你也可以使用itertools.takewhile()
:
from itertools import takewhile
with open(XXX) as f:
nlines = sum(1 for line in takewhile(lambda l: l != '\n', f)
if not line.starstwith("#"))
答案 3 :(得分:2)
from itertools import ifilter,takewhile
with open("test.txt") as f:
fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda line: not line.startswith("#"), f)))
print(fil)
或者索引编制速度可能比startswith
调用快:
fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda x: x[0] != "#", f)))
使用str.strip
将捕获任何空行。
索引确实有点快:
In [11]: from itertools import ifilter,takewhile
In [12]: %%timeit
....: with open("test.txt") as f:
....: fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda x: x[0] != "#", f)))
....:
1000 loops, best of 3: 400 µs per loop
In [13]: %%timeit
....: with open("test.txt") as f:
....: fil = sum(1 for _ in takewhile(str.strip, ifilter(lambda line: not line.startswith("#"), f)))
....:
1000 loops, best of 3: 531 µs per loop
答案 4 :(得分:1)
如果你想要速度并且不介意使用bash
grep -v '^#' yourfile | wc -l
将计算所有不以#开头的行,它将比python更快。
答案 5 :(得分:0)
您是否想要评论行数或不评论? 如果是这样的,那么这应该有效。
comment_lines = sum([1 for line in open(PATHDIFF) if line.startswith('#')])
non_comment_lines = sum([1 for line in open(PATHDIFF) if not line.startswith('#')])