Question

我有一些Python代码可以像这样逐行评估文件：

def evaluate_file():
  firstline = True
  for line in lines:
    if firstline:
      # do something with the first line
      firstline = False
    else
      # do something else

99％的情况下，正在查看的不是第一行。首先写99％的案例（即第一个if语句为if !firstline）是否有效率的提高？

Answer 1

除非用例有更多内容（这需要用[mcve]更新问题），但当您只需要切片lines并执行所需的操作时，为什么要使用条件语句来标识“第一行”和第一个一起，然后与其余的一起？

def evaluate_file():
  # evaluate the first line:
  # do something with ``lines[0]``

  for line in lines[1:]
      # do some

Answer 2

如果您的1％条件仅仅是“第一行”，则差异将不明显。节省时间的是在循环外执行1％，然后无条件循环后续项。这样一来，可以节省99％的第一行条件无用测试，而且更重要的是，这可以提高代码的可读性（如果仅通过减少缩进级别）。

使用迭代器是完成此分离的有效方法。例如：

iLines =iter(lines)
for line in iLines:
    # do something with the first line
    break 
for line in iLines:
    # do something else with other lines

这将与列表以及无法建立索引或切片的源一起使用。它还将允许可能需要跳过多个初始项目的更复杂的“第一部分”条件。

如果对第一行执行的代码是对所有行的通用代码的“补充”，则使用枚举代替标记变量可能会更容易（尽管速度稍慢）：

for i,line in enumerate(lines):
    if i==0:
       # do something special for the first line
    # common code for all lines

您还可以使用迭代器方法，将通用代码放置在两种情况下都调用的函数中。

Answer 3

编写了一个快速的小测试，可能并不完美，但这是可行的。

结果，基于 1,109,890行数据和100次执行：

ifTest：0.0472489972114563秒
arrayTest：0.06530603981018067秒
flipIfTest：0.04617302393913269秒

通过执行建议的更改而获得的性能提升很小，但值得。

虽然arrayTest是最慢的，但我还是更喜欢它，因为它感觉更直观。

代码：

import time

def ifTest(lines): 
    count = 0
    firstline = True
    for line in lines:
        if firstline:
            firstline = False
        else:
            count = count + 1
    return count

def arrayTest(lines):
    count = 0
    firstline = lines[0]
    for line in lines[1:]:
        count = count + 1
    return count

def flipIfTest(lines):
    count = 0
    firstline = True
    for line in lines:
        if not firstline:
            count = count + 1
        else:
            firstline = False
    return count

f = open("data.txt", "r")
lines = f.read().splitlines()

runs = 100

avg = 0
for i in range(0,runs):
    start = time.time()
    res = ifTest(lines)
    end = time.time()
    print("Lines Read: {}, Time: {}".format(res, end - start))
    avg = avg + (end - start)
avg = avg / runs
print("ifTest: {}".format(avg))

avg = 0
for i in range(0,runs):
    start = time.time()
    res = arrayTest(lines)
    end = time.time()
    print("Lines Read: {}, Time: {}".format(res, end - start))
    avg = avg + (end - start)
avg = avg / runs
print("arrayTest: {}".format(avg))

avg = 0
for i in range(0,runs):
    start = time.time()
    res = flipIfTest(lines)
    end = time.time()
    print("Lines Read: {}, Time: {}".format(res, end - start))
    avg = avg + (end - start)
avg = avg / runs
print("flipIfTest: {}".format(avg))

在if循环中首先写入99％的情况是否会提高效率？

3 个答案: