如何在部分文字中搜索字符串?

时间:2019-06-19 14:27:17

标签: python

我试图在多个文本文件中搜索文本“ 1-2”,“ 2-3”,“ 3-H”,这些文本出现在以“ play”开头的文本行的最后一个字段中。 / p>

下面是文本文件的示例

id,ARI201803290
version,2
info,visteam,COL
info,hometeam,ARI
info,site,PHO01
play,1,0,lemad001,22,CFBBX,HR/78/F
play,1,0,arenn001,20,BBX,S7/L+
play,1,0,stort001,12,SBCFC,K
play,1,0,gonzc001,02,SS>S,K
play,1,1,perad001,32,BTBBCX,S9/G
play,1,1,polla001,02,CSX,S7/L+.1-2
play,1,1,goldp001,32,SBFBBB,W.2-3;1-2
play,1,1,lambj001,00,X,D9/F+.3-H;2-H;1-3
play,1,1,avila001,31,BC*BBX,31/G.3-H;2-3
play,2,0,grayj003,12,CC*BS,K
play,2,1,dysoj001,31,BBCBX,43/G
play,2,1,corbp001,31,CBBBX,43/G
play,4,1,avila001,02,SC1>X,S8/L.1-2

对于上面的文本文件,我希望输出为'4',因为总共出现了4个“ 1-2”,“ 2-3”和“ 3-H”。

到目前为止,我的代码在下面,但是我不确定从哪里开始编写一行代码来实现此功能。

import os

input_folder = 'files'  # path of folder containing the multiple text files

# create a list with file names 
data_files = [os.path.join(input_folder, file) for file in     
os.listdir(input_folder)]

# open csv file for writing
csv = open('myoutput.csv', 'w')  
def write_to_csv(line):
    print(line)
    csv.write(line)


j=0 # initialise as 0
count_of_plate_appearances=0 # initialise as 0


for file in data_files:
    with open(file, 'r') as f:  # use context manager to open files
        for line in f:
            lines = f.readlines()
            i=0      
            while i < len(lines):
                temp_array = lines[i].rstrip().split(",")
                if temp_array[0] == "id":
                    j=0
                    count_of_plate_appearances=0
                    game_id = temp_array[1]
                    awayteam = lines[i+2].rstrip().split(",")[2]
                    hometeam = lines[i+3].rstrip().split(",")[2]
                    date = lines[i+5].rstrip().split(",")[2]

                    for j in range(i+46,i+120,1): #only check for plate appearances this when temp_array[0] == "id"
                        temp_array2 = lines[j].rstrip().split(",") #create new array to check for plate apperances
                        if temp_array2[0] == "play" and temp_array2[2] == "1": # plate apperance occurs when these are true

count_of_plate_appearances=count_of_plate_appearances+1
                    #print(count_of_plate_appearances)
                    output_for_csv2=(game_id,date,hometeam, awayteam,str(count_of_plate_appearances))
                    print(output_for_csv2)
                    csv.write(','.join(output_for_csv2) + '\n')                     
                    i=i+1

                else:
                    i=i+1

                    j=0
                    count_of_plate_appearances=0
                #quit()


csv.close() 

关于如何执行此操作的任何建议?预先感谢!

1 个答案:

答案 0 :(得分:0)

您可以使用regex,我将您的文本放在名为file.txt的文件中。

import re
a = ['1-2', '2-3', '3-H'] # What you want to count
find_this = re.compile('|'.join(a)) # Make search string
count = 0
with open('file.txt', 'r') as f:
    for line in f.readlines():
        count += len(find_this.findall(line)) # Each findall returns the list of things found
print(count) # 7

或更短的解决方案:(向wjandrea致谢以暗示使用生成器)

import re
a = ['1-2', '2-3', '3-H'] # What you want to count
find_this = re.compile('|'.join(a)) # Make search string
with open('file.txt', 'r') as f:
    count = sum(len(find_this.findall(line)) for line in f)
print(count) # 7