如何在文本文件中搜索单词并复制到另一个单词时忽略单词的大小写

时间:2014-01-23 15:32:09

标签: python file-io argparse

我正在尝试在python中编写一个程序,该程序在txt文件中搜索用户指定的单词,并将包含该单词的选定行复制到另一个文件中。

此外,用户还可以选择排除任何字词。

(例如,假设用户搜索单词“exception”并想要排除单词“abc”,那么代码将只复制其中包含“exception”而不是“abc”的行。

现在所有工作都将从命令提示符完成。

输入将是:

file.py test.txt(输入文件)test_mod.txt(输出文件)-e abc(不包括-e表示的单词)-s exception(搜索单词用-s表示) 现在,用户可以选择输入多个排除词和多个搜索词。

我使用argparse模块完成了程序并运行。 我的问题是它只需要将小写单词作为搜索或排除单词。也就是说,如果我输入“exception”作为搜索词,它就不会找到“Exception”或“EXCEPTION”。我该如何解决这个问题?我想忽略搜索和排除单词的情况。 这是我现在的代码:

import sys
import os
import argparse
import tempfile
import re

def main(): #main method

 try:

  parser = argparse.ArgumentParser(description='Copies selected lines from files') #Defining the parser
  parser.add_argument('input_file')  #Adds the command line arguments to be given 
  parser.add_argument('output_file')
  parser.add_argument('-e',action="append")
  parser.add_argument('-s',action="append")
  args = parser.parse_args() #Parses the Arguments
  user_input1 = (args.e)    #takes the word which is to be excluded.
  user_input2 = (args.s)    #takes the word which is to be included.

  def include_exclude(input_file, output_file, exclusion_list=[], inclusion_list=[]):  #Function which actually does the file writing and also handles exceptions
      if input_file == output_file: 
          sys.exit("ERROR! Two file names cannot be the same.")
      else:
          try: 
              found_s = False  #These 3 boolean variables will be used later to handle different exceptions.
              found_e = False
              found_e1 = True
              with open(output_file, 'w') as fo:  #opens the output file
                  with open(input_file, 'r') as fi: #opens the input file
                       for line in fi:     #reads all the line in the input file
                           if user_input2 != None:


                               inclusion_words_in_line = map(lambda x: x in line, inclusion_list)#Mapping the inclusion and the exclusion list in a new list in the namespace  
                               if user_input1 != None and user_input2 != None:                   #This list is defined as a single variable as condition operators cannot be applied to lists
                                  exclusion_words_in_line = map(lambda x: x in line, exclusion_list)
                                  if any(inclusion_words_in_line) and not any(exclusion_words_in_line): #Main argument which includes the search word and excludes the exclusion words

                                      fo.write(line)  #writes in the output file
                                      found_s = True

                               elif user_input1 == None and user_input2 != None: #This portion executes if no exclude word is given,only the search word    
                                   if any(inclusion_words_in_line):
                                       fo.write(line)
                                       found_e = True
                                       found_s = True
                                       found_e1 = False

                       if user_input2 == None and user_input1 != None:       #No search word entered   

                           print("No search word entered.")

                       if not found_s and found_e:             #If the search word is not found                        
                           print("The search word couldn't be found.")
                           fo.close()
                           os.remove(output_file)

                       elif not found_e and not found_s:      #If both are not found                        
                           print("\nNOTE: \nCopy error.")
                           fo.close()
                           os.remove(output_file)

                       elif not found_e1:               #If only the search word is entered                              
                           print("\nNOTE: \nThe exclusion word was not entered! \nWriting only the lines containing search words")

          except IOError:
              print("IO error or wrong file name.")
              fo.close()
              os.remove(output_file)
  if user_input1 != user_input2 :  #this part prevents the output file creation if someone inputs 2 same words creating an anomaly.
         include_exclude(args.input_file, args.output_file, user_input1, user_input2);


  if user_input1 == user_input2 :  #This part prevents the program from running further if both of the words are same
         sys.exit('\nERROR!!\nThe word to be excluded and the word to be included cannot be the same.') 


 except SystemExit as e:                       #Exception handles sys.exit()
       sys.exit(e)



if __name__ == '__main__':
  main()

2 个答案:

答案 0 :(得分:3)

执行此操作的典型方法是选择一个案例,然后进行所有比较:

if word.lower() == "exception":

对于您的情况,这可能看起来像:

inclusion_words_in_line = map(lambda x: x in line.lower(), 
                              inclusion_list)

答案 1 :(得分:0)

这看起来像是尝试构建搜索引擎,您可以使用像pylucene

这样的库实现此目的

然后您就可以运行以下查询:

+include -exclude

嗯,当然还有更多,它可能值得学习曲线。