我有一个.txt文件,看起来像:
abcd this is the header
more header, nothing here I need
***********
column1 column2
========= =========
12.4 A
34.6 mm
1.3 um
=====================
footer, nothing that I need here
***** more text ******
我正在尝试读取列中的数据,每个列都有自己的列表,col1 = [12.4,34.6,1.3]和col2 = ['A','mm','um']。
这是我到目前为止所做的,但是当我运行代码时唯一返回的是“无”:
def readfile():
y = sys.argv[1]
z = open(y)
for line in z:
data = False
if data == True:
toks = line.split()
print toks
if line.startswith('========= ========='):
data = True
continue
if line.startswith('====================='):
data = False
break
print readfile()
有什么建议吗?
答案 0 :(得分:1)
有很多方法可以做到这一点。
一种方式涉及:
喜欢这样:
with open('data.dat', 'r') as f:
lines = f.readlines()
#This gets the limits of the lines that contain the header / footer delimiters
#We can use the Column header delimiters double-time as the footer delimiter:
#`=====================` also matches against this.
#Note, the output size is supposed to be 2. If there are lines than contain this delimiter, you'll get problems
limits = [idx for idx, data in enumerate(lines) if '=========' in data]
#`data` now contains all the lines between these limits
data = lines[limits[0]+1:limits[1]]
#Now, you can parse the lines into rows by splitting the line on whitespace
rows = [line.split() for line in data]
#Column 1 has float data, so we convert the string data to float
col1 = [float(row[0]) for row in rows]
#Column 2 is String data, so there is nothing further to do
col2 = [row[1] for row in rows]
print col1, col2
此输出(来自您的示例):
[12.4, 34.6, 1.3] #Column 1
['A', 'mm', 'um'] #Column 2
答案 1 :(得分:0)
您采用的方法可能效率不高,但它有点儿错误。因此你的错误数据提取。
您需要在data
&之后立即触发line.startswith('========= =========')
。因此,在那之前它应该保持False
。
然后,您的数据将被提取到line.startswith('=====================')
。
希望我帮到你。
def readfile():
y = sys.argv[1]
toks = []
with open(y) as z:
data = False
for line in z:
if line.startswith('========= ========='):
data = True
continue
if line.startswith('====================='):
data = False
break
if data:
toks.append(line.split())
print toks
col1, col2 = zip(*toks) # Or just simply, return zip(*toks)
return col1, col2
print readfile()
with
声明更加pythonic&优于z = open(file)
。
答案 2 :(得分:0)
如果您知道该文件有多少页眉/页脚,那么您可以使用此方法。
path = r'path\to\file.csv'
header = 2
footer = 2
buffer = []
with open(path, 'r') as f:
for _ in range(header):
f.readline()
for _ in range(footer):
buffer.append(f.readline())
for line in f:
buffer.append(line)
line = buffer.pop(0)
# do stuff to line
print(line)
跳过标题行是微不足道的,我在跳过页脚行时遇到了问题:
^注意:如果您不介意将整个文件存储在内存中,可以使用:
path = r'path\to\file.csv'
header = 2
footer = 2
with open(path, 'r') as f:
for line in f.readlines()[header:-footer if footer else None]:
# do stuff to line
print(line)