我想阅读以这种方式格式化的文本文件
1 100
---stuff----
2 100
---stuff---
3 200
---stuff--
1表示案例ID,100表示行数" stuff"占据。有没有办法让我在python中分别阅读1 100和2 100?
答案 0 :(得分:0)
档案结构: (请注意,数字行是制表符分隔的)
1 3
abc
def
ghi
2 2
jkl
mno
3 4
pqr
stu
vwx
yz
现在尝试:
f=open(filename)
all_lines=f.readlines() #read all lines
content=[] #empty list
for i in range(len(all_lines)): #for each line
if(len(all_lines[i].split('\t'))==2): #check if when you split by tab the line has two members only
i=i+1
c=[] #for the current segment
while(i<len(all_lines) and len(all_lines[i].split('\t'))!=2): #until next segment is reached
c.append(all_lines[i].strip()) #append to current segment
i=i+1
content.append(c) #append entire current segment to overall content
for c in content:
print(c)
输出:
['abc', 'def', 'ghi']
['jkl', 'mno']
['pqr', 'stu', 'vwx', 'yz']
答案 1 :(得分:0)
您可以简单地尝试生成器方法:
with open('file','r') as f:
def generator_approach():
sub_=[]
for line in f:
if 'stuff' in line.strip():
yield sub_
sub_=[]
else:
sub_.append(line.strip())
if sub_:
yield sub_
closure_=generator_approach()
print(list(closure_))
输出:
[['1 100'], ['2 100'], ['3 200']]