在文本文件中搜索多个字符串,并将结果打印到新的文本文件中

时间:2015-08-19 13:36:22

标签: python string file search text

我对python编程很新,我正努力学习文件I / O.

我目前正在制作一个简单的程序来读取文本文档并打印出结果。到目前为止,我已经能够在本网站的许多资源和问题的帮助下创建这个程序。

但是我很好奇我如何从文本文档中读取多个单独的字符串并将结果字符串保存到文本文档中。

下面的程序是我创建的程序,它允许我在文本文档中搜索关键字,并将这些关键字之间的结果打印到另一个文本文件中。但是我每次搜索只能执行一组开始和结束关键字:

from Tkinter import *
import tkSimpleDialog
import tkMessageBox
from tkFileDialog import askopenfilename

root = Tk()
w = Label(root, text ="Configuration Inspector")
w.pack()
tkMessageBox.showinfo("Welcome", "This is version 1.00 of Configuration Inspector Text")
filename = askopenfilename() # Data Search Text File
outputfilename = askopenfilename() #Output Text File 

with open(filename, "rb") as f_input:
    start_token = tkSimpleDialog.askstring("Serial Number", "What is the device serial number?")
    end_token = tkSimpleDialog.askstring("End Keyword", "What is the end keyword")
    reText = re.search("%s(.*?)%s" % (re.escape(start_token + ",SHOWALL"), re.escape(end_token)), f_input.read(), re.S)
    if reText:
        output = reText.group(1)
        fo = open(outputfilename, "wb")
        fo.write(output)
        fo.close()

       print output
    else:
        tkMessageBox.showinfo("Output", "Sorry that input was not found in the file")
        print "not found"

所以这个程序的作用是,它允许用户选择文本文档搜索该文档的初始关键字和结束关键字,然后将这两个关键词之间的所有内容打印成新的文本文档。

我想要实现的是允许用户选择文本文档并在该文本文档中搜索多个关键字,并将结果打印到同一个输出文本文件中。

换句话说,假设我有以下文本文档:

something something something something
something something something something STARTkeyword1 something
data1
data2
data3
data4
data5
ENDkeyword1
something something something something
something something something something STARTkeyword2 something
data1
data2
data3
data4
data5
Data6
ENDkeyword2
something something something something
something something something something STARTkeyword3 something
data1
data2
data3
data4
data5
data6
data7
data8
ENDkeyword3

我希望能够使用3个不同的起始关键字和3个不同的结束关键字搜索此文本文档,然后在其间打印相同的输出文本文件。

例如,我的输出文本文档看起来像:

something
data1
data2
data3
data4
data5
ENDkeyword1

something
data1
data2
data3
data4
data5
Data6
ENDkeyword2

something
data1
data2
data3
data4
data5
data6
data7
data8
ENDkeyword3

我尝试过的一种强力方法是创建一个循环,让用户一次输入一个新的关键字,但每当我尝试写入文本文档中的相同输出文件时,它将覆盖上一个条目使用附加。是否有任何方法可以使用户可以在文本文档中搜索多个字符串并打印出带或不带循环的多个结果?

-----------------编辑:

非常感谢你们所有人我越来越接近你的提示了一个很好的最终版本..这是我现在的代码:

def process(infile, outfile, keywords):

    keys = [ [k[0], k[1], 0] for k in keywords ]
    endk = None
    with open(infile, "rb") as fdin:
        with open(outfile, "wb") as fdout:
            for line in fdin:
                if endk is not None:
                    fdout.write(line)
                    if line.find(endk) >= 0:
                        fdout.write("\n")
                        endk = None
                else:
                    for k in keys:
                        index = line.find(k[0])
                        if index >= 0:
                            fdout.write(line[index + len(k[0]):].lstrip())
                            endk = k[1]
                            k[2] += 1
    if endk is not None:
        raise Exception(endk + " not found before end of file")
    return keys



from Tkinter import *
import tkSimpleDialog
import tkMessageBox
from tkFileDialog import askopenfilename

root = Tk()
w = Label(root, text ="Configuration Inspector")
w.pack()
tkMessageBox.showinfo("Welcome", "This is version 1.00 of Configuration Inspector ")
infile = askopenfilename() #
outfile = askopenfilename() #

start_token = tkSimpleDialog.askstring("Serial Number", "What is the device serial number?")
end_token = tkSimpleDialog.askstring("End Keyword", "What is the end keyword")

process(infile,outfile,((start_token + ",SHOWALL",end_token),))

到目前为止它的工作原理但是现在是时候让我自己迷失了,这是一个由分隔符分隔的多字符串输入。所以,如果我输入了

  

STARTKeyword1,STARTKeyword2,STARTKeyword3,STARTKeyword4

进入程序提示符我希望能够将这些关键字分开并将它们放入

  

处理(infile中,OUTFILE,关键词)

功能,以便仅提示用户输入一次并允许多个字符串搜索文件。我正在考虑使用循环或将分离的输入创建到数组中。

如果这个问题远非原始问题,我会问我会关闭这个问题并打开另一个问题,这样我可以在信用到期时给予信用。

7 个答案:

答案 0 :(得分:2)

我会使用一个单独的函数:

  • 输入文件的路径
  • 输出文件的路径
  • 包含(startkeyword,endkeyword)对的可迭代

然后,如果在开始和结束之间,我将逐行处理文件,计算每对被发现的时间。这样调用者就可以知道找到了哪些对,以及每个对的次数。

这是一种可能的实施方式:

def process(infile, outfile, keywords):
    '''Search through inputfile whatever is between a pair startkeyword (excluded)
and endkeyword (included). Each chunk if copied to outfile and followed with
an empty line.
infile and outfile are strings representing file paths
keyword is an iterable containing pairs (startkeyword, endkeyword)

Raises an exception if  an endkeyword is not found before end of file

Returns a list of lists [ startkeyword, endkeyword, nb of occurences]'''
    keys = [ [k[0], k[1], 0] for k in keywords ]
    endk = None
    with open(infile, "r") as fdin:
        with open(outfile, "w") as fdout:
            for line in fdin:
                if endk is not None:
                    fdout.write(line)
                    if line.find(endk) >= 0:
                        fdout.write("\n")
                        endk = None
                else:
                    for k in keys:
                        index = line.find(k[0])
                        if index >= 0:
                            fdout.write(line[index + len(k[0]):].lstrip())
                            endk = k[1]
                            k[2] += 1
    if endk is not None:
        raise Exception(endk + " not found before end of file")
    return keys

答案 1 :(得分:0)

  

我尝试在Text文档中写入相同的输出文件   过去写上一个条目。

您是否尝试使用追加而不是写?

f = open('filename', 'a')

答案 2 :(得分:0)

我不是100%确定我理解这个问题。但是,如果我理解正确,您可以只列出一个列表,其中包含每个开始/结束关键字对。对于文档中的每个单词,检查它是否等于其中一个列表中的第一个元素(start关键字)。如果是,则从列表中弹出关键字(使end-keyword成为列表的第一个元素)并开始将所有后续单词保存为字符串(每个开始/结束关键字对的不同字符串)。点击结束关键字后,只需从列表中弹出它,然后从周围列表中删除用于包含开始/结束关键字的列表。最后,您应该有3个字符串,其中包含不同开始/结束关键字之间的所有单词。现在只需将所有3个打印到文件

编辑:

如果主要问题是您无法附加到该文件,但实际上您每次都在重写文件,请尝试以这种方式打开文件,而不是今天的方式:

fo = open(outputfilename, "ab")

来自python docs:

“第一个参数是一个包含文件名的字符串。第二个参数是另一个字符串,其中包含一些描述文件使用方式的字符。当只读取文件时,模式可以是'r',< strong>'w'仅用于写入(具有相同名称的现有文件将被删除),'a'打开文件以进行追加;写入文件的任何数据都会自动添加到结尾。'r +'打开文件进行读写.mode参数是可选的;如果省略则会假设'r'。“

答案 3 :(得分:0)

这可能是事情的开始

import os
directory = os.getcwd()
path = directory + "/" + "textdoc.txt"

newFileName = directory + '/' + "data" + '.txt'
newFob = open(newFileName, 'w')

keywords = [["STARTkeyword1","ENDkeyword1"],["STARTkeyword2","ENDkeyword2"]]


fob = open(path, 'r')
objectLines = fob.readlines()

startfound=False

for keyword in keywords:
    for line in objectLines:

        if startfound and keyword[1] in line:
            startfound=False

        if startfound:
            newFob.write(line)

        if keyword[0] in line:
            startfound=True

newFob.close()

如果带有您提供的数据的textdoc.txt文本位于当前目录中,则脚本将在当前目录中创建一个名为data.txt的文件,其输出如下:

data1
data2
data3
data4
data5
data1
data2
data3
data4
data5
Data6

理想情况下,您可以输入更多关键字,或允许用户输入一些关键字,并在关键字列表中填写。

答案 4 :(得分:0)

对于每个开始/结束对,创建一个类的实例,比如Pair,使用feed(line)方法,其开始和键以及用于存储感兴趣数据的缓冲区或列表。

将所有配对实例放入列表中。使用这些关键字扫描匹配项,并在每行中提供每个配对实例。如果实例不在start和end关键字之间,则Pair实例处于非活动状态&amp;忘记数据,否则 feed 会将其添加到自己的StringBuffer或行列表中。甚至直接到你的输出文件。

结束时写下你的文件。

答案 5 :(得分:0)

我使用以下示例文本文件测试了代码。

something something something something
something something something something
something something something something
something something something something
something something something something
something something something something STARTkeyword3 something
data3
data3
data3
data3
data3
ENDkeyword3
something something something something
something something something something
something something something something
something something something something STARTkeyword1 something
data1
data1
data1
ENDkeyword1
something something something something
something something something something
something something something something
something something something something
something something something something STARTkeyword2 something
data2
data2
data2
Data2
ENDkeyword2
something something something something
something something something something
something something something something
something something something something
something something something something
something something something something STARTkeyword3 something
data3
data3
data3
data3
data3
ENDkeyword3

访问https://www.youtube.com/watch?v=D6LYa6X2Otg&feature=youtu.be

代码如下:

## ---------------------------------------------------------------------------
## Created by: James P. Lopez
## Created date: 9/21/2020
## Modified date: 9/29/2020, 10/20/2020
## https://stackoverflow.com/questions/32097118/search-text-file-for-multiple-strings-and-print-out-results-to-a-new-text-file
## Search text file for multiple strings and print out results to a new text file
## ---------------------------------------------------------------------------
import time, datetime, string

############################################################################################
############ Adding date to end of output file name
start_c = datetime.datetime.now();##print start_c
c1 = (("%(start_c)s" % vars()).partition(' ')[0]);##print c1
new_str = string.replace(c1, '-', '_');new_str = "_"+new_str;##print(new_str)

#### Path
path = "S:\\PythonProjects\\TextFileReadWrite"
Bslash = "\\"

#### Text files
####    Read file
##filename1 = "openfilename1" ## This is the read file
#### Below sample with exceptions
filename1 = "openfilenameexception1" ## This is the read file
####    Write file
outputfilename1 = "writefilename2"  ## This is the write file

#### Text file extension
extT = ".txt"

#### Full Text file name
filename = ("%(path)s%(Bslash)s%(filename1)s%(extT)s" % vars());print filename
outputfilename = ("%(path)s%(Bslash)s%(outputfilename1)s%(new_str)s%(extT)s" % vars())
print outputfilename

#### Sum rows in text file
with open(filename) as filename1:
   SumReadFile = sum(1 for _ in filename1)
filename1.close()
##print SumReadFile

############################################################################################
############ Create text file if it does not exist
############ or truncate if it does exist
outputfilename2=open('%(outputfilename)s' % vars(),'w')
outputfilename2.close()

#### Counters
foundstart = 0
CountAllStartKeys = 0
CountAllEndKeys = 0
CountAllBetweenData = 0
CountAllRecordsProcessed = 0
    
#### Set of keys
s1 = 'STARTkeyword1';s2 = 'STARTkeyword2';s3 = 'STARTkeyword3'
e1 = 'ENDkeyword1';e2 = 'ENDkeyword2';e3 = 'ENDkeyword3'

#### Opening output file append
outputFiles=open('%(outputfilename)s' % vars(),'a')

SetOfKeys = [(s1,e1),(s2,e2),(s3,e3)]
for keys in SetOfKeys:
    print(keys)
    search1 = keys[0]; ##print search1 ## This is the start key
    search2 = keys[1]; ##print search2 ## This is the end key
    with open("%(filename)s"% vars(), "r") as readFile1:
        for line in readFile1:
            print line
            CountAllRecordsProcessed += 1
            if foundstart == 0:
                if search1 in line:
                    #### It will write the start key
                    print ("Yes found start key %(search1)s within line = %(line)s" % vars())
                    outputFiles.write(line)
                    foundstart += 1;CountAllStartKeys += 1
##                    time.sleep(2)
                    continue
            if foundstart >= 1:
                if search2 in line:
                    #### It will write the end key
                    print ("Yes found end key %(search2)s within line = %(line)s\nn" % vars())
                    outputFiles.write(line)
                    foundstart = 0;CountAllEndKeys += 1
##                    time.sleep(2)
                elif search1 in line:
                    #### It will append to output text file and write no end key found
                    print ("\nATTENTION!      No matching end key within line = %(line)s\n" % vars())
                    print ("\nHowever, found start key %(search1)s within line = %(line)s\n" % vars())
                    outputFiles.write("\nNo matching end key within line = %(line)s\n" % vars())
                    outputFiles.write("\nHowever, found start key %(search1)s within line = %(line)s\n" % vars())
                    outputFiles.write(line)
                    CountAllStartKeys += 1
##                    time.sleep(5)
                    continue
                else:
                    #### It will write the rows between start and end key
                    print ("Yes, found between data = %(line)s" % vars())
                    outputFiles.write(line)
                    CountAllBetweenData += 1
##                    time.sleep(2)
    readFile1.close()
outputFiles.close()

print "\nFinished Text File Read and Write"
print "\nTotal Number of Start Key Words Processed = " + str(CountAllStartKeys)
print "Total Number of End Key Words Processed   = " + str(CountAllEndKeys)
print "\nTotal Number of Between Data Processed    = " + str(CountAllBetweenData)
print "\nTotal Sum of Lines in Read File   = " + str(SumReadFile)
NumberOfSetOfKeys = CountAllRecordsProcessed / SumReadFile; ##print NumberOfSetOfKeys
print "Total Number of Set of Keys       = " + str(NumberOfSetOfKeys)
print "Total Number of Records Processed = " + str(CountAllRecordsProcessed)
print ("\n%(SumReadFile)s multiplied by %(NumberOfSetOfKeys)s = %(CountAllRecordsProcessed)s" % vars())

答案 6 :(得分:0)

该程序将只处理读取的文件一次,而不是我以前的文章,后者一次处理一次开始和结束键,总共进行了三个完整的迭代。它已经通过我以前的文章中的样本数据进行了测试。 访问:https://youtu.be/PJjBftGhSNc

## ---------------------------------------------------------------------------
## Created by: James P. Lopez
## Created date: 9/21/2020
## Modified date: 10/1/2020, 10/20/2020
## https://stackoverflow.com/questions/32097118/search-text-file-for-multiple-strings-and-print-out-results-to-a-new-text-file
## Search text file for multiple strings and print out results to a new text file
## ---------------------------------------------------------------------------
import time, datetime, string

############################################################################################
############ Adding date to end of file name
start_c = datetime.datetime.now(); ##print start_c
c1 = (("%(start_c)s" % vars()).partition(' ')[0]); ##print c1
new_str = string.replace(c1, '-', '_');new_str = "_"+new_str;##print(new_str)

#### Path
path = "S:\\PythonProjects\\TextFileReadWrite"
Bslash = "\\"

#### Text files
####    Read file
##filename1 = "openfilename1" ## This is the read file
#### Below sample with exceptions
filename1 = "openfilenameexception1" ## This is the read file
####    Write file
outputfilename1 = "writefilename2"  ## This is the write file

#### Full Text file name
filename = ("%(path)s%(Bslash)s%(filename1)s.txt" % vars());print filename
outputfilename = ("%(path)s%(Bslash)s%(outputfilename1)s%(new_str)s.txt" % vars())
print outputfilename

#### Counters
foundstart = 0
CountAllStartKeys = 0
CountAllEndKeys = 0
CountAllBetweenData = 0
CountAllRecordsProcessed = 0
#### Start Key Not Found
SKNF = 0

#### Sum number or rows in text file
with open(filename) as filename1:
   SumReadFile = sum(1 for _ in filename1)
filename1.close()
print SumReadFile

#### Total set of keys
start1 = 'STARTkeyword1';start2 = 'STARTkeyword2';start3 = 'STARTkeyword3'
end1 = 'ENDkeyword1';end2 = 'ENDkeyword2';end3 = 'ENDkeyword3'

#### Count number of unique start and end keys
SK1 = 0;SK2 = 0;SK3 = 0;EK1 = 0;EK2 = 0;EK3 = 0
Keys = [start1,start2,start3,end1,end2,end3]
##print Keys
with open(filename) as filename1:
    for line in filename1:
##        print line
        if any(word in line for word in Keys):
            if start1 in line:
                SK1+=1
            elif start2 in line:
                SK2+=1
            elif start3 in line:
                SK3+=1
            elif end1 in line:
                EK1+=1
            elif end2 in line:
                EK2+=1
            elif end3 in line:
                EK3+=1 
filename1.close()

############################################################################################
############ Create if it does not exist or truncate if it does exist
outputfilename2=open('%(outputfilename)s' % vars(),'w')
outputfilename2.close()

### Opening output file to append data
outputFiles=open('%(outputfilename)s' % vars(),'a')

#### We are only checking for the start keys
StartKeys = [start1,start2,start3]
#### Opening and reading the first line in read text file
with open("%(filename)s"% vars(), "r") as readFile1:
    for line in readFile1:
        CountAllRecordsProcessed += 1
        #### We are checking if one of the StartKeys is in line
        if any(word in line for word in StartKeys):
            #### Setting the variables (s1 and e1) for the start and end keys
            if start1 in line:
                s1 = start1; e1 = end1; SKNF = 1; ##print ('%(s1)s , %(e1)s' % vars())
            elif start2 in line:
                s1 = start2; e1 = end2; SKNF = 1; ##print ('%(s1)s , %(e1)s' % vars())
            elif start3 in line:
                s1 = start3; e1 = end3; SKNF = 1; ##print ('%(s1)s , %(e1)s' % vars())
##                time.sleep(2)
        if foundstart == 0 and SKNF <> 0:
            if s1 in line:
                #### It will append to output text file and write the start key
                print ("Yes found start key %(s1)s within line = %(line)s" % vars())
                outputFiles.write(line)
                foundstart += 1; CountAllStartKeys += 1
##                    time.sleep(2)
                continue
        if foundstart >= 1 and SKNF <> 0:
            if e1 in line:
                #### It will append to output text file and write the end key
                print ("Yes found end key %(e1)s within line = %(line)s" % vars())
                outputFiles.write(line)
                foundstart = 0; SKNF = 0; CountAllEndKeys += 1
##                    time.sleep(2)
            elif s1 in line:
                #### It will append to output text file and write no end key found
                print ("\nATTENTION!      No matching end key within line = %(line)s\n" % vars())
                print ("\nHowever, found start key %(s1)s within line = %(line)s\n" % vars())
                outputFiles.write("\nNo matching end key within line = %(line)s\n" % vars())
                outputFiles.write("\nHowever, found start key %(s1)s within line = %(line)s\n" % vars())
                outputFiles.write(line)
                CountAllStartKeys += 1
##                    time.sleep(2)
                continue
            else:
                #### It will append to output text file and write the rows between start and end key
                print ("Yes found between data = %(line)s" % vars())
                outputFiles.write(line)
                CountAllBetweenData += 1
##                    time.sleep(2)

#### Closing read and write text files
readFile1.close()
outputFiles.close()

print "\nFinished Text File Read and Write"

print '\nTotal Number of Unique Start Keys'
print ("%(start1)s = " % vars())+str(SK1)
print ("%(start2)s = " % vars())+str(SK2)
print ("%(start3)s = " % vars())+str(SK3)
print "Total Number of Start Key Words Processed = " + str(CountAllStartKeys)
print '\nTotal Number of Unique End Keys'
print ("%(end1)s = " % vars())+str(EK1)
print ("%(end2)s = " % vars())+str(EK2)
print ("%(end3)s = " % vars())+str(EK3)
print "Total Number of End Key Words Processed   = " + str(CountAllEndKeys)
print "\nTotal Number of Between Data Processed    = " + str(CountAllBetweenData)
print "\nTotal Sum of Lines in Read File   = " + str(SumReadFile)
print "Total Number of Records Processed = " + str(CountAllRecordsProcessed)