Question

我有一个包含这样内容的文件（我不希望以任何方式更改文件的内容）：

.
.
lines I don't need.
.
.
abc      # I know where it starts and the data can be anything, not just abc
efg      # I know where it ends.
.
.
lines I don't need.
.
.

我知道有用数据开始和结束的行号（索引）。有用的行可以有任何不可预测的数据。现在我希望从这些数据中列出一个列表，如下所示：

[['a','b','c'],['e','f','g']]

请注意输入文件中a，b等之间没有空格，所以我猜split（）函数不起作用。在python中实现这一目标的最佳方法是什么？

Answer 1

使用seek获取文件的特定部分，

with open(<filename>) as file:
    file.seek(<start_index>)
    data = file.read(<end_index> - <start_index>)

这将为您提供给定索引之间的部分。

Answer 2

您可以遍历文件并忽略您不想要的文件。然后使用split function分割单词。

for line in file:
    if(IsLineThatYouWant(line)):
        characters = line.split("")
        DoMoreThingsWithChars(characters)

Answer 3

您可以阅读所有行，然后缩小范围：

with open('myfile.txt') as f:
    lines = [line.strip() for line in f]

现在只选择你需要的线条，假设它们始终以完全＆＃34; abc＆＃34;并以完全＆＃34; efg＆＃34;

结束

lines = lines[lines.index('abc'):lines.index('efg')+1]

如果您需要更灵活的方法缩小线条，则需要在问题中更具体。无论如何，如果你确定文件适合内存，这个解决方案是好的。对于较大的文件，您必须更加复杂，并且可以随时删除线条＆＃34;

lines_to_keep = []
started = False
with open('myfile.txt') as f:
    for line in f:
        line = line.strip()
        if 'abc' in line:
            started = True
        if started:
            lines_to_keep.append(line)
        if 'efg' in line:
            break

完成所有操作后，您可以随意拆分列表：

lines = [list(line) for line in lines)]

Answer 4

在合并了不同答案和评论的所有部分之后，这就是我为解决问题所做的工作：

mylist = []
infile.seek(start_byte)
for i in range(start_line_no - end_line_no + 1):
    mylist.append(list(infile.readline().strip()))

不过要计算start_byte，通过计算所有字符，空格并为每个'\ n'加1。如果有更好的方法，请告诉我。

仅将特定行作为python中文件的输入

4 个答案: