在python中将文本文件作为数组打开

时间:2014-01-22 10:52:20

标签: python

我试图在python中打开一个.txt文件作为数组,所以我可以对其中的元素进行操作。 .txt文件(abc.txt)看起来像这样。

AL192012,               TONY,     20,
20121021, 1800,  , LO, 20.1N,  50.8W,  25, 1011,
20121022, 0000,  , LO, 20.4N,  51.2W,  25, 1011,
20121022, 0600,  , LO, 20.8N,  51.5W,  25, 1010,
20121022, 1200,  , LO, 21.3N,  51.7W,  30, 1009,
AL182012,              SANDY,     45,
20121021, 1800,  , LO, 14.3N,  77.4W,  25, 1006,
20121022, 0000,  , LO, 13.9N,  77.8W,  25, 1005,
20121022, 0600,  , LO, 13.5N,  78.2W,  25, 1003,
20121022, 1200,  , TD, 13.1N,  78.6W,  30, 1002,

我尝试了pd.read_csv('abc.txt')loadtxt("abc.txt")genfromtxt("abc.txt")。但是他们只生成了包含三列的数组,可能是因为第一行只有三列。但我希望它与.txt文件具有相同的八列。这可能吗?谢谢!

3 个答案:

答案 0 :(得分:2)

尝试这样的事情:

data = []
with open("filename") as f:
  for line in f:
    data.append(line.split(","))

并且它将为您提供可以操作的数据的2D数组。

如果你想转置它,你不能只使用普通的zip,你需要使用itertools.izip_longest,如上所述here

所以你转换它就像:

data = list(itertools.izip_longest(*data))

答案 1 :(得分:1)

>>> with open(filename) as f:
        data = [[cell.strip() for cell in row.rstrip(',').split(',')] for row in f]

>>> for row in data:
        print(row)

['AL192012', 'TONY', '20']
['20121021', '1800', '', 'LO', '20.1N', '50.8W', '25', '1011']
['20121022', '0000', '', 'LO', '20.4N', '51.2W', '25', '1011']
['20121022', '0600', '', 'LO', '20.8N', '51.5W', '25', '1010']
['20121022', '1200', '', 'LO', '21.3N', '51.7W', '30', '1009']
['AL182012', 'SANDY', '45']
['20121021', '1800', '', 'LO', '14.3N', '77.4W', '25', '1006']
['20121022', '0000', '', 'LO', '13.9N', '77.8W', '25', '1005']
['20121022', '0600', '', 'LO', '13.5N', '78.2W', '25', '1003']
['20121022', '1200', '', 'TD', '13.1N', '78.6W', '30', '1002']

如果你想修复短线的索引,你可以明确地做到这一点:

>>> data = [row if len(row) == 8 else row[0:1] + [''] * 3 + row[1:3] + [''] * 2 for row in data]
>>> for row in data:
        print(row)

['AL192012', '', '', '', 'TONY', '20', '', '']
['20121021', '1800', '', 'LO', '20.1N', '50.8W', '25', '1011']
['20121022', '0000', '', 'LO', '20.4N', '51.2W', '25', '1011']
['20121022', '0600', '', 'LO', '20.8N', '51.5W', '25', '1010']
['20121022', '1200', '', 'LO', '21.3N', '51.7W', '30', '1009']
['AL182012', '', '', '', 'SANDY', '45', '', '']
['20121021', '1800', '', 'LO', '14.3N', '77.4W', '25', '1006']
['20121022', '0000', '', 'LO', '13.9N', '77.8W', '25', '1005']
['20121022', '0600', '', 'LO', '13.5N', '78.2W', '25', '1003']
['20121022', '1200', '', 'TD', '13.1N', '78.6W', '30', '1002']

答案 2 :(得分:0)

这是一个片段:

#!/usr/bin/python

import sys

with open(sys.argv[1], 'r') as f:
    content = f.readlines()

for w in content:
    print w

    # split and loop again -> w.split(',')

f.readlines()返回一个数组
w是一个数组。