正则表达式返回意外匹配

时间:2019-06-07 10:41:30

标签: regex python-3.x

我设计了以下正则表达式模式来过滤出以下文件名:

从...开始

(4个数字,'_',2个数字,'_',2个数字'_',1个(非数字)字符,'_',1个或更多数字实例,'_' ,1个或多个数字实例,'_')

之后... (1个或多个(非数字)字符[包括'_']的实例)

REGEX = "^(\d{4}\_\d{2}\_\d{2}\_\D\_\d+\_\d+\_)(\D+)"

REGEX似乎无法正常工作。 它会立即过滤出可能以'。'开头的隐藏文件等,但似乎允许任何以[a-zA-Z]开头的文件通过,并且不会按照我的特定REGEX需求进行过滤。

在查阅在线教程之后,我以为我做了正确的顺序,但是显然没有。我将不胜感激。

谢谢

我尝试用相应的相应类替换“ \ d”等,IE代替“ \ d”使用“ [0-9]”,而不是“ \ D”使用“ [a-zA-Z] ”等。

我什至尝试使用正则表达式“ ^ \ d”,即IE应该是“以数字开头”,但是以[a-zA-Z]范围开头的文件名仍然可以通过。

两个版本的过滤(下面)仍然允许与正则表达式不匹配的文件名(作为字符串)通过(或不被删除)。

版本1: 我尝试使用以下代码删除与上面的REGEX不匹配的“文件名”列表中的任何字符串“文件名”(注意:未显示所有代码,为了保持此问题的简洁性,IE枚举/索引是实际需要并实施。

# IMPORT NECESSARY FUNCTIONS:
# IMPORT LIBRARY FOR FILE MANIPULATION:
import os
# IMPORT LIBRARY FOR REGULAR EXPRESSIONS:
import re
from re import *
# IMPORT LIBRARY FOR FILE PATH STUFF:
from pathlib import *

# CREATE LIST OF STRINGS (THAT ARE FILE NAMES IN CURRENT DIR):
filenames = os.listdir()

control_list_of_filenames = ["1234_23_23_SKDJFHS",
                             "1234_56_78_a_91011_1213_text_text_text_text",
                             "9876_54_32_a_1011_1213_text_text_text_text",
                             "regex_filter_tester.py",
                             "kasjdhfksdhf_skjdhfksh_skjdhf",
                             "regex_filter_tester_2.py"]

regex_file_format_filter = "^(\d{4}_\d{2}_\d{2}_\D_\d+_\d+_)(\D+)"

for index, filename in enumerate(filenames):
    if not match(regex_file_format_filter, filename):
        filenames.remove(filename)

# Prints the list of filenames actively read from the local folder.
#
# Filenames that remain after filtering are:
#       control_list_of_filenames[1], [2], [4] & [5]
#
# Should only be producing:
#       control_list_of_filenames[1] & [2]

print("\n\nSearch results of filenames read from local folder (that should comply with regex) are...\n")
for filename in filenames:
    print(filename)


for index, filename in enumerate(control_list_of_filenames):
    if not match(regex_file_format_filter, filename):
        control_list_of_filenames.remove(filename)

# Prints the list of filenames read from the above control list.
#
# NOTE: reading & filtering strings from the above control list,
# does not behave the same as actually running the code with
# actual files / folders in the same directory.
#
# Filenames that remain after filtering are:
#       control_list_of_filenames[1], [2] & [4]
#
# Should only be producing:
#       control_list_of_filenames[1] & [2]
print("\n\nResults of control filenames read from created list (that should comply with regex) are...\n")
for filename in control_list_of_filenames:
    print(filename)

理想情况下,我希望以上代码删除“ filenames”列表中的所有字符串,即“ filename”,这些字符串与上述REGEX模式不完全匹配。

按原样,它允许除以“。”开头的隐藏文件以外的所有内容。通过。

0 个答案:

没有答案