Question

我想阅读以这种方式格式化的文本文件

1      100
---stuff----
2      100
---stuff---
3      200
---stuff--

1表示案例ID，100表示行数＆＃34; stuff＆＃34;占据。有没有办法让我在python中分别阅读1 100和2 100？

Answer 1

档案结构：（请注意，数字行是制表符分隔的）

1   3
abc
def
ghi
2   2
jkl
mno
3   4
pqr
stu
vwx
yz

现在尝试：

f=open(filename)
all_lines=f.readlines() #read all lines

content=[] #empty list 

for i in range(len(all_lines)): #for each line
    if(len(all_lines[i].split('\t'))==2): #check if when you split by tab the line has two members only
        i=i+1
        c=[] #for the current segment
        while(i<len(all_lines) and len(all_lines[i].split('\t'))!=2): #until next segment is reached
            c.append(all_lines[i].strip()) #append to current segment
            i=i+1
        content.append(c) #append entire current segment to overall content

for c in content:
    print(c)

输出：

['abc', 'def', 'ghi']
['jkl', 'mno']
['pqr', 'stu', 'vwx', 'yz']

Answer 2

您可以简单地尝试生成器方法：

with open('file','r') as f:
    def generator_approach():
        sub_=[]
        for line in f:

            if 'stuff' in line.strip():
                yield sub_
                sub_=[]
            else:
                sub_.append(line.strip())
        if sub_:
            yield sub_

    closure_=generator_approach()
    print(list(closure_))

输出：

[['1      100'], ['2      100'], ['3      200']]

有没有办法让python文件读取分区文本文件？

2 个答案: