讨论

Question

所以，我有一个有3列的数据文件。我要做的是创建一个函数，将开始和结束行号作为输入。类似的东西：

def(start line number, end line number):
    with open("data.txt", 'r') as f:
        for line in f:
            splitted_line = line.strip().split(",")
            date1 = datetime.strptime(splitted_line[0],'%Y%m%d:%H:%M:%S.%f')
            price = float(splitted_line[1])
            volume = int(splitted_line[2])
            my_tuple=(date1,price,volume)

Answer 1

def func(start,end):
    with open("data.txt", 'r') as f:
        for idx,line in enumerate(f):
          if idx == end:
            break 
          if idx < start:
            continue

          splitted_line = line.strip().split(",")
          date1 = datetime.strptime(splitted_line[0],'%Y%m%d:%H:%M:%S.%f')
          price = float(splitted_line[1])
          volume = int(splitted_line[2])
          my_tuple=(date1,price,volume)

Answer 2

如果我正确读取此内容，此功能应该只读取[start_line, end_line]范围内编号的行（我假设这是一个包含范围，即你想读两者起点和终点线也是如此）。为什么不write your for loop with enumeration并且只是跳过超出传递范围的行？

def read_line_range_inclusive(start_line, end_line):
    filename = "data.txt"
    with open(filename) as f:
        for i, line in enumerate(f):
            if i < start_line: # will read the start line itself
                continue # keep going...
            if i > end_line: # will read the end line itself
                break # we're done

            # ... perform operations on lines ...

另外，用逗号分割时要小心;这适用于1,2,3这样的简单行，但1,2,"a,b,c",3怎么办？"abc"不应该拆分成单独的列？我建议使用built-in csv module，它会自动处理这些边缘情况：

import csv

def read_line_range_inclusive(start_line, end_line):
    filename = "data.txt"
    with open(filename) as f:
        for i, row in enumerate(csv.reader(f)):
            # row will already be separated into list
            # ... proceed as before ...

请注意，您只能对文件对象本身not on the csv.reader parsed file使用with语句，因此这不起作用：with csv.reader(open(filename)) as f:。

Answer 3

如果您使用CSV阅读器，则可以访问行号：

csvreader.line_num

从源迭代器读取的行数。这不是与返回的记录数相同，因为记录可以跨越多个线。

Answer 4

我们可以将linecache模块和csv结合起来完成工作：

import csv
import linecache


def get_lines(filename, start_line_number, end_line_number):
    """
    Given a file name, start line and end line numbers,
    return those lines in the file
    """
    for line_number in range(start_line_number, end_line_number + 1):
        yield linecache.getline(filename, line_number)


if __name__ == '__main__':
    # Get lines 4-6 inclusive from the file
    lines = get_lines('data.txt', 4, 6)
    reader = csv.reader(lines)

    for row in reader:
        print(row)

考虑数据文件data.txt：

# this is line 1
# line 2

501,john
502,karen
503,alice

# skip this line
# and this, too

上面的代码将产生以下输出：

['501', 'john']
['502', 'karen']
['503', 'alice']

讨论

linecache是一个鲜为人知的库，允许用户快速从文本文件中检索行
csv是一个处理逗号分隔值的库
通过组合它们，我们可以毫不费力地完成工作

按行位置读取文件

4 个答案:

讨论