读取.txt文件中的数据,不包括页眉和页脚

时间:2013-11-07 23:55:28

标签: python python-2.7 file-io

我有一个.txt文件,看起来像:

abcd this is the header
more header, nothing here I need
***********
column1    column2
=========  =========
  12.4       A
  34.6       mm
  1.3        um
=====================
footer, nothing that I need here
***** more text ******

我正在尝试读取列中的数据,每个列都有自己的列表,col1 = [12.4,34.6,1.3]和col2 = ['A','mm','um']。

这是我到目前为止所做的,但是当我运行代码时唯一返回的是“无”:

def readfile():
    y = sys.argv[1]

    z = open(y)
    for line in z:

        data = False
        if data == True:
            toks = line.split()
            print toks

        if line.startswith('=========  ========='):
            data = True
            continue

        if line.startswith('====================='):
            data = False
            break
print readfile()

有什么建议吗?

3 个答案:

答案 0 :(得分:1)

有很多方法可以做到这一点。

一种方式涉及:

  1. 将文件读入行
  2. 从读取的行中,找到包含列标题分隔符的行的索引(因为这也与页脚标题匹配)。
  3. 然后,将数据存储在这些行之间。
  4. 通过基于空格分割这些行并将它们存储到各自的列中来解析这些行。
  5. 喜欢这样:

    with open('data.dat', 'r') as f:
        lines = f.readlines()
    
        #This gets the limits of the lines that contain the header / footer delimiters
        #We can use the Column header delimiters double-time as the footer delimiter:
        #`=====================` also matches against this.
        #Note, the output size is supposed to be 2. If there are lines than contain this delimiter, you'll get problems
        limits = [idx for idx, data in enumerate(lines) if '=========' in data]
    
        #`data` now contains all the lines between these limits
        data = lines[limits[0]+1:limits[1]] 
    
        #Now, you can parse the lines into rows by splitting the line on whitespace
        rows = [line.split() for line in data]
    
        #Column 1 has float data, so we convert the string data to float
        col1 = [float(row[0]) for row in rows]
    
        #Column 2 is String data, so there is nothing further to do
        col2 = [row[1] for row in rows]
    
        print col1, col2
    

    此输出(来自您的示例):

    [12.4, 34.6, 1.3] #Column 1
    ['A', 'mm', 'um'] #Column 2
    

答案 1 :(得分:0)

您采用的方法可能效率不高,但它有点儿错误。因此你的错误数据提取。

您需要在data&之后立即触发line.startswith('========= =========')。因此,在那之前它应该保持False

然后,您的数据将被提取到line.startswith('=====================')

希望我帮到你。

def readfile():
    y = sys.argv[1]
    toks = []
    with open(y) as z:
        data = False

        for line in z:

            if line.startswith('=========  ========='):
                data = True
                continue

            if line.startswith('====================='):
                data = False
                break

            if data:
                toks.append(line.split())
                print toks
    col1, col2 = zip(*toks) # Or just simply, return zip(*toks)
    return col1, col2

print readfile()

with声明更加pythonic&优于z = open(file)

答案 2 :(得分:0)

如果您知道该文件有多少页眉/页脚,那么您可以使用此方法。

path = r'path\to\file.csv'
header = 2
footer = 2
buffer = []

with open(path, 'r') as f:
    for _ in range(header):
        f.readline()

    for _ in range(footer):
        buffer.append(f.readline())

    for line in f:
        buffer.append(line)
        line = buffer.pop(0)

        # do stuff to line
        print(line)

跳过标题行是微不足道的,我在跳过页脚行时遇到了问题:

  • 我不想以任何方式手动更改文件
  • 我不想计算文件中的行数
  • 我不想将整个文件存储在列表中(即readlines()) ^

^注意:如果您不介意将整个文件存储在内存中,可以使用:

path = r'path\to\file.csv'
header = 2
footer = 2

with open(path, 'r') as f:
    for line in f.readlines()[header:-footer if footer else None]:
        # do stuff to line
        print(line)