我设计了以下正则表达式模式来过滤出以下文件名:
从...开始
(4个数字,'_',2个数字,'_',2个数字'_',1个(非数字)字符,'_',1个或更多数字实例,'_' ,1个或多个数字实例,'_')
之后... (1个或多个(非数字)字符[包括'_']的实例)
REGEX = "^(\d{4}\_\d{2}\_\d{2}\_\D\_\d+\_\d+\_)(\D+)"
REGEX似乎无法正常工作。 它会立即过滤出可能以'。'开头的隐藏文件等,但似乎允许任何以[a-zA-Z]开头的文件通过,并且不会按照我的特定REGEX需求进行过滤。
在查阅在线教程之后,我以为我做了正确的顺序,但是显然没有。我将不胜感激。
谢谢
我尝试用相应的相应类替换“ \ d”等,IE代替“ \ d”使用“ [0-9]”,而不是“ \ D”使用“ [a-zA-Z] ”等。
我什至尝试使用正则表达式“ ^ \ d”,即IE应该是“以数字开头”,但是以[a-zA-Z]范围开头的文件名仍然可以通过。
两个版本的过滤(下面)仍然允许与正则表达式不匹配的文件名(作为字符串)通过(或不被删除)。
版本1: 我尝试使用以下代码删除与上面的REGEX不匹配的“文件名”列表中的任何字符串“文件名”(注意:未显示所有代码,为了保持此问题的简洁性,IE枚举/索引是实际需要并实施。
# IMPORT NECESSARY FUNCTIONS:
# IMPORT LIBRARY FOR FILE MANIPULATION:
import os
# IMPORT LIBRARY FOR REGULAR EXPRESSIONS:
import re
from re import *
# IMPORT LIBRARY FOR FILE PATH STUFF:
from pathlib import *
# CREATE LIST OF STRINGS (THAT ARE FILE NAMES IN CURRENT DIR):
filenames = os.listdir()
control_list_of_filenames = ["1234_23_23_SKDJFHS",
"1234_56_78_a_91011_1213_text_text_text_text",
"9876_54_32_a_1011_1213_text_text_text_text",
"regex_filter_tester.py",
"kasjdhfksdhf_skjdhfksh_skjdhf",
"regex_filter_tester_2.py"]
regex_file_format_filter = "^(\d{4}_\d{2}_\d{2}_\D_\d+_\d+_)(\D+)"
for index, filename in enumerate(filenames):
if not match(regex_file_format_filter, filename):
filenames.remove(filename)
# Prints the list of filenames actively read from the local folder.
#
# Filenames that remain after filtering are:
# control_list_of_filenames[1], [2], [4] & [5]
#
# Should only be producing:
# control_list_of_filenames[1] & [2]
print("\n\nSearch results of filenames read from local folder (that should comply with regex) are...\n")
for filename in filenames:
print(filename)
for index, filename in enumerate(control_list_of_filenames):
if not match(regex_file_format_filter, filename):
control_list_of_filenames.remove(filename)
# Prints the list of filenames read from the above control list.
#
# NOTE: reading & filtering strings from the above control list,
# does not behave the same as actually running the code with
# actual files / folders in the same directory.
#
# Filenames that remain after filtering are:
# control_list_of_filenames[1], [2] & [4]
#
# Should only be producing:
# control_list_of_filenames[1] & [2]
print("\n\nResults of control filenames read from created list (that should comply with regex) are...\n")
for filename in control_list_of_filenames:
print(filename)
理想情况下,我希望以上代码删除“ filenames”列表中的所有字符串,即“ filename”,这些字符串与上述REGEX模式不完全匹配。
按原样,它允许除以“。”开头的隐藏文件以外的所有内容。通过。