我正在尝试提取具有特定文本文件的文本:
----
data1
data1
data1
extractme
----
data2
data2
data2
----
data3
data3
extractme
----
然后将其转储到文本文件中以便
----
data1
data1
data1
extractme
---
data3
data3
extractme
---
感谢您的帮助。
答案 0 :(得分:5)
这对我来说效果很好。您的示例数据位于名为“data.txt”的文件中,输出将转到“result.txt”
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
keepCurrentSet = True
for line in inFile:
buffer.append(line)
if line.startswith("----"):
#---- starts a new data set
if keepCurrentSet:
outFile.write("".join(buffer))
#now reset our state
keepCurrentSet = False
buffer = []
elif line.startswith("extractme"):
keepCurrentSet = True
inFile.close()
outFile.close()
答案 1 :(得分:5)
我想象破折号的数量变化(输入中有4个,有时是4个,有时是输出中的3个)是一个错误,实际上并不需要(因为没有算法甚至暗示过,要解释有多少破折号)不同场合的输出)。
我会根据阅读和一次产生一个行块来构建任务:
def readbyblock(f):
while True:
block = []
for line in f:
if line = '----\n': break
block.append(line)
if not block: break
yield block
这样(选择性)输出可以与输入整齐地分开:
with open('infile.txt') as fin:
with open('oufile.txt', 'w') as fou:
for block in readbyblock(fin):
if 'extractme\n' in block:
fou.writelines(block)
fou.write('----\n')
如果块很大,这在性能方面不是最佳的,因为它在if
子句中隐含的块中的所有行上都有一个单独的循环。所以,一个好的重构可能是:
def selectivereadbyblock(f, marker='extractme\n'):
while True:
block = []
extract = False
for line in f:
if line = '----\n': break
block.append(line)
if line==marker: extract = True
if not block: break
if extract: yield block
with open('infile.txt') as fin:
with open('oufile.txt', 'w') as fou:
for block in selectivereadbyblock(fin):
fou.writelines(block)
fou.write('----\n')
参数化分隔符(现在硬编码为输入和输出的'---- \ n')是另一种合理的编码调整。
答案 2 :(得分:2)
对于Python2
#!/usr/bin/env python
with open("infile.txt") as infile:
with open("outfile.txt","w") as outfile:
collector = []
for line in infile:
if line.startswith("----"):
collector = []
collector.append(line)
if line.startswith("extractme"):
for outline in collector:
outfile.write(outline)
对于Python3
#!/usr/bin/env python3
with open("infile.txt") as infile, open("outfile.txt","w") as outfile:
collector = []
for line in infile:
if line.startswith("----"):
collector = []
collector.append(line)
if line.startswith("extractme"):
for outline in collector:
outfile.write(outline)
答案 3 :(得分:1)
data=open("file").read().split("----")
print '----'.join([ i for i in data if "extractme" in i ])