如何从工作目录中收集所有xlsx文件,但打开的文件除外

时间:2018-06-08 09:28:21

标签: python regex python-3.x

import re

my_path = os.getcwd()
files = [f for f in listdir(my_path) if isfile(join(my_path, f))]
pattern = re.compile('xlsx$') # xlsx files
pattern_not = re.compile('^~') # the ones that are open start with ~
files = [x for x in files if (pattern.search(x) and (not pattern_not.search(x)))]

我写了这段代码,收集所有的文件在我的工作目录,然后过滤xlsx文件,但不会的那些是开放的。

我的问题是,有没有办法写这个更干净/更紧凑,所以没有指定两个不同的pattern,所以在我的情况下patternpattern_not

2 个答案:

答案 0 :(得分:1)

您的正则表达式解决方案不起作用 - 您需要根据给定的lock-file-name获取原始文件名以将其排除。您只是从目录中的所有xlsx文件中排除了锁定文件。

这可能是迈向正确方向的第一步 - 仔细检查最后一个有问题的方向 - 你必须以某种方式解决这个问题:

# exel/word/powerpoint create a lock-file with by prepending ~$ to a filename that you open. 
# the complete lock-file name is different for different lengths of original file lengths.
# Depending on the original name you get 
#   ~$name.xlsx    from   name.xlsx
#   ~$1name.xlsx   from   1name.xlsx
#   ~$12name.xlsx  from   12name.xlsx
#   ~$23name.xlsx  from   123name.xlsx
#   ~$34name.xlsx  from   1234name.xlsx

import re

# file lists all *.xlsx NOT starting with ~$
file = ["test.xlsx", "1test.xlsx", "12test.xlsx", "123test.xlsx", "1234test.xlsx"]
# these are only the lockfiles starting with ~$
lock = ["~$1test.xlsx", "~$12test.xlsx", "~$23test.xlsx", "~$34test.xlsx","~$test.xlsx"]

for lockFile in lock:
    lockBase = lockFile[2:]  # remove the ~$
    nonOpen = [x for x in file if not (x == lockBase or x.endswith(lockBase))]
    isOpen =  [x for x in file if x.endswith(lockBase)]

    print("Locfile:", lockFile)
    print("Is open:", isOpen)
    print("Non open", nonOpen)

输出:

Locfile: ~$1test.xlsx
Is open: ['1test.xlsx']
Non open ['test.xlsx', '12test.xlsx', '123test.xlsx', '1234test.xlsx']

Locfile: ~$12test.xlsx
Is open: ['12test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '123test.xlsx', '1234test.xlsx']

Locfile: ~$23test.xlsx
Is open: ['123test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '12test.xlsx', '1234test.xlsx']

Locfile: ~$34test.xlsx
Is open: ['1234test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '12test.xlsx', '123test.xlsx']

# problematic - all other files end on this pattern, you would have 
# to smarten the testing quite a bit to avoid this:
Locfile: ~$test.xlsx
Is open: ['test.xlsx', '1test.xlsx', '12test.xlsx', '123test.xlsx', '1234test.xlsx']
Non open []   # all end on test.xlsx - thats a problem ...

答案 1 :(得分:1)

我会用你的模式代替

^[^~]+\.xlsx$

删除你的pattern_not。该正则表达式应该只匹配不以〜开头并以.xlsx结尾的文件(当文件中途有时它不匹配)