import re
my_path = os.getcwd()
files = [f for f in listdir(my_path) if isfile(join(my_path, f))]
pattern = re.compile('xlsx$') # xlsx files
pattern_not = re.compile('^~') # the ones that are open start with ~
files = [x for x in files if (pattern.search(x) and (not pattern_not.search(x)))]
我写了这段代码,收集所有的文件在我的工作目录,然后过滤xlsx
文件,但不会的那些是开放的。
我的问题是,有没有办法写这个更干净/更紧凑,所以没有指定两个不同的pattern
,所以在我的情况下pattern
和pattern_not
答案 0 :(得分:1)
您的正则表达式解决方案不起作用 - 您需要根据给定的lock-file-name获取原始文件名以将其排除。您只是从目录中的所有xlsx文件中排除了锁定文件。
这可能是迈向正确方向的第一步 - 仔细检查最后一个有问题的方向 - 你必须以某种方式解决这个问题:
# exel/word/powerpoint create a lock-file with by prepending ~$ to a filename that you open.
# the complete lock-file name is different for different lengths of original file lengths.
# Depending on the original name you get
# ~$name.xlsx from name.xlsx
# ~$1name.xlsx from 1name.xlsx
# ~$12name.xlsx from 12name.xlsx
# ~$23name.xlsx from 123name.xlsx
# ~$34name.xlsx from 1234name.xlsx
import re
# file lists all *.xlsx NOT starting with ~$
file = ["test.xlsx", "1test.xlsx", "12test.xlsx", "123test.xlsx", "1234test.xlsx"]
# these are only the lockfiles starting with ~$
lock = ["~$1test.xlsx", "~$12test.xlsx", "~$23test.xlsx", "~$34test.xlsx","~$test.xlsx"]
for lockFile in lock:
lockBase = lockFile[2:] # remove the ~$
nonOpen = [x for x in file if not (x == lockBase or x.endswith(lockBase))]
isOpen = [x for x in file if x.endswith(lockBase)]
print("Locfile:", lockFile)
print("Is open:", isOpen)
print("Non open", nonOpen)
输出:
Locfile: ~$1test.xlsx
Is open: ['1test.xlsx']
Non open ['test.xlsx', '12test.xlsx', '123test.xlsx', '1234test.xlsx']
Locfile: ~$12test.xlsx
Is open: ['12test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '123test.xlsx', '1234test.xlsx']
Locfile: ~$23test.xlsx
Is open: ['123test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '12test.xlsx', '1234test.xlsx']
Locfile: ~$34test.xlsx
Is open: ['1234test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '12test.xlsx', '123test.xlsx']
# problematic - all other files end on this pattern, you would have
# to smarten the testing quite a bit to avoid this:
Locfile: ~$test.xlsx
Is open: ['test.xlsx', '1test.xlsx', '12test.xlsx', '123test.xlsx', '1234test.xlsx']
Non open [] # all end on test.xlsx - thats a problem ...
答案 1 :(得分:1)
我会用你的模式代替
^[^~]+\.xlsx$
删除你的pattern_not。该正则表达式应该只匹配不以〜开头并以.xlsx结尾的文件(当文件中途有时它不匹配)