Question

我想使用pathlib递归搜索所有文件夹中的所有文件，但是我想排除以'。'开头的隐藏系统文件。（如“ .DS_Store”）但是我在pathlib中找不到像startswith这样的函数。如何在pathlib中实现startswith？我知道如何使用OS。

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if f.startswith(".")])
    print(fcount)

Answer 1

startswith（）是Python字符串方法，请参见https://python-reference.readthedocs.io/en/latest/docs/str/startswith.html

由于您的f是Path对象，因此必须首先通过str(f)

将其转换为字符串

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if str(f).startswith(".")])
    print(fcount)

Answer 2

有一种startswith-您可以使用pathlib.Path.is_relative_to()：

pathlib.Path.is_relative_to()是在Python 3.9中添加的，如果要在早期版本（3.6或更高版本）中使用它，则需要使用backport pathlib3x：

$> python -m pip install pathlib3x
$> python
>>> p = Path('/etc/passwd')
>>> p.is_relative_to('/etc')
True
>>> p.is_relative_to('/usr')
False

您可以在github或PyPi上找到pathlib3x

但这对于您的示例仍然无济于事，因为您想跳过以“。”开头的文件。 -因此您的解决方案是正确的-但效率不高：

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if not str(f.name).startswith(".")])
    print(fcount)

想象一下，您在 scan_path 中有200万个文件，这将创建一个包含200万个pathlib.Path对象的列表。哪有，这需要一些时间和记忆...

最好使用fnmatch之类的过滤器或glob函数之类的过滤器-我正在考虑将其用于pathlib3x。

Path.glob（）返回一个generator iterator，它需要更少的内存。

因此，为了节省内存，解决方案可以是：

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = 0
    # we only have one instance of f at the time
    for f in root_directory.glob('**/*'):
        if not str(f.name).startswith(".")]):
            fcount = fcount + 1
    print(count)

^{免责声明：我是pathlib3x库的作者。}

Answer 3

我的解决方案：

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if not str(f.name).startswith(".")])
    print(fcount)

使用递归文件搜索并使用pathlib排除startswith（）

3 个答案: