Question

我不确定为什么这个正则表达式模式（'\ s +'）不会忽略空格：下面的代码搜索目录中的所有.txt和.log文件，并返回用户输入的匹配字符串。它接受字符串并将其转换为十六进制和ASCII，然后同时搜索所有.txt和.log文件以查找字符串，十六进制和ASCII匹配。我在3个不同的.txt文件中输入了转换字符串的值：字符串为一，十六进制为另一个，ascii为第三个。最初，所有文件都匹配。但是，我在第二个主要if语句中添加了下面的regex.search(re.sub(r'\s+', '', line))，然后进入.txt文件，其中输入了转换为ASCII的字符串，并在字符串中添加了一个空格。然后，我尝试使用相同的字符串进行另一次搜索，并且只找到两个匹配项：string和hex。搜索“忽略空格”与更改的ASCII字符串不匹配。我是在忽视还是做错了什么？

输入字符串：Rozelle07（已匹配）十六进制转换：526f7a656c6c653037（匹配） ascii转换：821111221011081081014855（匹配）

更改ascii字符串：8211112210110810810148 55（当我尝试使用时，regexp不匹配）。

 print "  Directory to be searched: c:\Python27 "
          directory = os.path.join("c:\\","SQA_log")
          userstring = raw_input("Enter a string name to search: ")
          userStrHEX = userstring.encode('hex')
          userStrASCII = ''.join(str(ord(char)) for char in userstring)
          regex = re.compile(r"(%s|%s|%s)" % ( re.escape( userstring ), re.escape( userStrHEX ), re.escape( userStrASCII )))
          choice = raw_input("Type 1: search with respect to whitespace. Type 2: search ignoring whitespace: ")
          if choice == '1':
               for root,dirname, files in os.walk(directory):
                  for file in files:
                      if file.endswith(".log") or file.endswith(".txt"):
                         f=open(os.path.join(root, file))
                         for i,line in enumerate(f.readlines()):
                             result = regex.search(line)
                             if regex.search(line):
                                print " "
                                print "Line: " + str(i)
                                print "File: " + os.path.join(root,file)
                                print "String Type: " + result.group()
                                print " "


                         f.close()
          re.purge()              
          if choice == '2':
             for root,dirname, files in os.walk(directory):
                 for file in files:
                     if file.endswith(".log") or file.endswith(".txt"):
                        f=open(os.path.join(root, file))
                        for i,line in enumerate(f.readlines()):
                            result = regex.search(re.sub(r'\s+', '',line))
                            if regex.search(line):
                               print " "
                               print "Line: " + str(i)
                               print "File: " + os.path.join(root,file)
                               print "String Type: " + result.group()
                               print " "

                        f.close()

Answer 1

我自己没有测试过，但我想如果regex.search（line）：应该是，如果结果：

Answer 2

对于您写的选项一：

result = regex.search(line)
   if regex.search(line):

你写的第二个选项

：

result = regex.search(re.sub(r'\s+', '',line))
   if regex.search(line):

如果您有result变量，请在if语句中使用它。我很抱歉这么说，但我认为你的复制粘贴错误。

为清晰起见

if choice == '1':
for root,dirname, files in os.walk(directory):
    for file in files:
    if file.endswith(".log") or file.endswith(".txt"):
        f=open(os.path.join(root, file))
        for i,line in enumerate(f.readlines()):
            result = regex.search(line)
            if result: # FIX 1
                print " "
                print "Line: " + str(i)
                print "File: " + os.path.join(root,file)
                print "String Type: " + result.group()
                print " "


    f.close()
re.purge()              
    if choice == '2':
    for root,dirname, files in os.walk(directory):
        for file in files:
        if file.endswith(".log") or file.endswith(".txt"):
            f=open(os.path.join(root, file))
            for i,line in enumerate(f.readlines()):
                result = regex.search(re.sub(r'\s+', '',line))
                if result: # FIX 2
                    print " "
                    print "Line: " + str(i)
                    print "File: " + os.path.join(root,file)
                    print "String Type: " + result.group()
                    print " "

                    f.close()

Answer 3

执行此操作时：

result = regex.search(re.sub(r'\s+', '',line))
if regex.search(line):
    ...

...你正在从行中删除空格，并将结果传递给regex.search()。该搜索的结果存储在result中。 然后忽略结果并对未修改的原始字符串执行regex.search()。 re.sub()不修改原始line，它返回一个替换结果的字符串。

我正在使用一个不会忽略空格的直接正则表达式模式（'\ s +'）

3 个答案: