从python中的txt文件中提取字符之间的字符串

时间:2013-07-13 17:04:16

标签: python character extract

我有一个txt文件,我想让python读取,我希望python从中提取一个特定于两个字符之间的字符串。这是一个例子:

排队

第b行

第c行

& TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTEST!

第d行

第e行

我想要的是python读取行以及何时遇到“&”我想让它开始打印线条(包括带有“$”的线),直到它遇到“!”

有什么建议吗?

3 个答案:

答案 0 :(得分:4)

这有效:

data=[]
flag=False
with open('/tmp/test.txt','r') as f:
    for line in f:
        if line.startswith('&'):
            flag=True
        if flag:
            data.append(line)
        if line.strip().endswith('!'):
            flag=False

print ''.join(data)  

如果你的文件足够小,那么将它全部读入内存不是问题,并且&!中没有歧义作为你想要的字符串的开头和结尾,这是更容易:

with open('/tmp/test.txt','r') as f:
    data=''.join(f.readlines())    

print data[data.index('&'):data.index('!')+1] 

或者,如果您想要读取整个文件,但只使用&!,如果它们分别位于行的开头和结尾,则可以使用正则表达式:

import re

with open('/tmp/test.txt','r') as f:
    data=''.join(f.readlines())    

m=re.search(r'^(&.*!)\s*?\n',data,re.S | re.M)    
if m: print m.group(1)   

答案 1 :(得分:0)

这是一个(非常简单!)的例子。

def Printer():
    f = open("yourfile.txt")
    Pr = False
    for line in f.readlines():
        if Pr: print line
        if "&" in line:
            Pr = True
            print line
        if "!" in line:
            Pr = False
    f.close()

答案 2 :(得分:0)

一个简单的解决方案如下所示。代码包含大量注释,使您了解每行代码。代码之美,它与运营商一起使用来处理异常并关闭资源(例如文件)。

#Specify the absolute path to the input file.
file_path = "input.txt" 

#Open the file in read mode. with operator is used to take care of try..except..finally block.
with open(file_path, "r") as f:
    '''Read the contents of file. Be careful here as this will read the entire file into memory. 
       If file is too large prefer iterating over file object
    ''' 
    content = f.read()
    size = len(content)
    start =0
    while start < size:
        # Read the starting index of & after the last ! index.
        start = content.find("&",start)
        # If found, continue else go to end of contents (this is just to avoid writing if statements.
        start = start if start != -1 else size
        # Read the starting index of ! after the last $ index.
        end = content.find("!", start)
        # Again, if found, continue else go to end of contents (this is just to avoid writing if statements.
        end = end if end != -1 else size
        '''print the contents between $ and ! (excluding both these operators. 
           If no ! character is found, print till the end of file.
        ''' 
        print content[start+1:end]
        # Move forward our cursor after the position of ! character. 
        start = end + 1