Question

我想从文件中获取一大块数据。我知道起跑线和终点线。我写了代码，但它不完整，我不知道如何进一步解决它。

file = open(filename,'r')
    end_line='### Leave a comment!'
star_line = 'Kill the master'
    for line in file:
            if star_line in line:   
        ??

Answer 1

startmarker = "ohai"
endmarker = "meheer?"
marking = False
result = []

with open("somefile") as f:
  for line in f:
    if line.startswith(startmarker): marking = True
    elif line.startswith(endmarker): marking = False

    if marking: result.append(line)

if len(result) > 1:
  print "".join(result[1:])

说明：with块是一种使用文件的好方法 - 它确保您以后不会忘记close()。 for遍历每一行，并且：

在看到以'ohai'（包括该行）开头的行
在看到以'meheer?'开头的行时停止输出（不输出该行）。

在循环之后，result包含所需文件的一部分以及该初始标记。我没有使循环更复杂以忽略标记，而是使用切片将其抛出：result[1:]从索引1开始返回result中的所有元素;换句话说，它排除了第一个元素（索引0）。

更新以反映添加部分线匹配：

startmarker = "ohai"
endmarker = "meheer?"
marking = False
result = []

with open("somefile") as f:
  for line in f:
    if not marking:
      index = line.find(startmarker)
      if index != -1:
        marking = True
        result.append(line[index:])
    else:
      index = line.rfind(endmarker)
      if index != -1:
        marking = False
        result.append(line[:index + len(endmarker)])
      else:
        result.append(line)

print "".join(result)

更多解释： marking仍然告诉我们是否应该输出整行，但我已经更改了开始和结束标记的if语句，如下所示：

如果我们尚未标记，我们会看到startmarker，然后从标记开始输出当前行。在这种情况下，find方法返回第一次出现startmarker的位置。 line[index:]符号表示“从line开始index的内容。

标记时，只需输出当前行，除非包含endmarker。在这里，我们使用rfind查找endmarker最右边的位置，line[...]符号表示'line到位index的内容（开头）（比赛）加上标记本身。另外：现在停止标记：）

Answer 2

如果读取整个文件不是问题，我会使用file.readlines()读取字符串列表中的所有行。

然后您可以使用list_of_lines.index(value)查找第一行和最后一行的索引，然后选择这两个索引之间的所有行。

Answer 3

首先，测试文件（假设Bash shell）：

for i in {0..100}; do  echo "line $i"; done > test_file.txt

这会生成一个101行文件，其中包含行line 0\nline 1\n ... line 100\n

此Python脚本捕获mark1之间的行，包括mark2，但不包括#!/usr/bin/env python mark1 = "line 22" mark2 = "line 26" record=False error=False buf = [] with open("test_file.txt") as f: for line in f: if mark1==line.rstrip(): if error==False and record==False: record=True if mark2==line.rstrip(): if record==False: error=True else: record=False if record==True and error==False: buf.append(line) if len(buf) > 1 and error==False: print "".join(buf) else: print "There was an error in there..."：

line 22
line 23
line 24
line 25

打印：

{{1}}

在这种情况下。如果未按正确顺序找到两个标记，则会打印错误。

如果标记之间的文件大小过大，则可能需要一些额外的逻辑。如果符合您的使用案例，您还可以为每一行使用正则表达式而不是完全匹配。

如何从文件中获取大量数据？

3 个答案: