Question

我正在制作一个Python脚本，除其他外，它允许从S3文件存储中下载文件。我正在使用boto模块来执行此操作。作为第一步，我获取用户指定存储桶中的文件列表。我将该列表存储在临时文本文件中。虽然S3没有真的拥有目录，但我们通过在文件名前添加假路径来伪造它，就像其他人一样。所以，假设我在我的桶中有以下内容：

2015-04-12/logs/east/01.gz
2015-04-12/logs/east/02.gz
2015-04-12/logs/west/01.gz
2015-04-12/logs/west/02.gz
2015-04-12/summary
2015-04-13/logs/east/01.gz
2015-04-13/logs/east/02.gz
2015-04-13/logs/west/01.gz
2015-04-13/logs/west/02.gz
2015-04-13/summary
README

这是一个非常非常短的文件版本。实际的是大约35,000行，因此需要以可管理的方式呈现给用户。我正在寻找有关如何解决这个问题的建议。我尝试的方式运行良好，除了它假设所有内容都共享一个公共目录路径长度。如你所见，那不再是真的。我确信会有更多变化，所以我想要容纳基本上任意的目录/文件结构。

实际上，我的方法是提取每个路径的最左边部分（即顶级目录），创建一个uniq'd列表，并将其呈现给用户选择。然后，当他们选择时，从他们选择的一切开始，并提取路径的第二部分（如果它存在），uniq那些并将它们呈现给用户。当他们选择时，连接他们的第一个选择，/和他们的第二个选择，并重复，直到没有剩下的路径。这是笨拙的，很难说，例如，“这个目录包含文件和目录。”

你会怎么做？我很难绕过这个问题，而不会产生尴尬的表现和麻烦的代码。谢谢。

Answer 1

如果我理解你的问题，你希望能够“深入”到类似路径的字符串列表中，对吗？

如果是这样，我建议在标准库中使用较新的pathlib模块。我将展示的代码允许您执行以下操作：

Current path: 
1: 2015-04-12/
2: 2015-04-13/
3: README
? 2

Current path: 2015-04-13
1: logs/
2: summary
? 1

Current path: 2015-04-13/logs
1: east/
2: west/
? 2

Current path: 2015-04-13/logs/west
1: 01.gz
2: 02.gz
? 1

You have selected:  2015-04-13/logs/west/01.gz

现在代码......首先，我们导入pathlib并将我们的字符串列表转换为pathlib.Path个对象列表：

import pathlib
paths = (
"""
2015-04-12/logs/east/01.gz
2015-04-12/logs/east/02.gz
2015-04-12/logs/west/01.gz
2015-04-12/logs/west/02.gz
2015-04-12/summary
2015-04-13/logs/east/01.gz
2015-04-13/logs/east/02.gz
2015-04-13/logs/west/01.gz
2015-04-13/logs/west/02.gz
2015-04-13/summary
README""").split()

paths = [pathlib.Path(p) for p in paths]

现在我想做一些辅助函数。首先是菜单功能，要求用户从选项列表中选择一个条目。这将返回列表中的一个元素：

def menu(choices):
    for i, choice in enumerate(choices, start=1):
        message = '{}: {}'.format(i, choice)
        print(message)

    while True:
        try:
            selection = choices[int(input('? ')) - 1]
        except (ValueError, IndexError):
            message = 'Invalid selection: must be between 1 and {}.'
            print(message.format(len(choices)))
        else:
            return selection

我们需要一个选择列表来提供给该函数，因此我们将创建一个path_choices函数，该函数可以执行同样的操作。我们为此函数提供了一个完整路径容器和用户选择的当前路径。然后它返回用户可以采取的“后续步骤”。例如，如果我们有一个可能性列表：['foo/apple', 'foo/banana/one.txt', 'foo/orange/pear/summary.txt']，curpath为foo，则此函数将返回{'apple', 'banana/', 'orange/'}。请注意，目录有尾部斜杠，这很好。

def path_choices(possibilities, curpath):
    choices = set()
    for path in possibilities:
        parts = path.relative_to(curpath).parts
        root = parts[0]
        if len(parts) > 1:
            root += '/'
        choices.add(root)
    return choices

最后，我们将有一个简单的函数来过滤路径容器，只返回以curpath开头且实际上不等于curpath的路径：

def filter_paths(possibilities, curpath):
    for path in possibilities:
        if path != curpath and str(path).startswith(str(curpath)):
            yield path

在此之后，只需要将这些功能粘合在一起：

curpath = ''
possibilities = paths

while possibilities:
    print('Current path: {}'.format(curpath))
    choices = sorted(path_choices(possibilities, curpath))
    selection = menu(choices)

    if curpath:
        curpath /= selection
    else:
        curpath = pathlib.Path(selection)

    possibilities = list(filter_paths(possibilities, curpath))
    print()

print('You have selected: ', curpath)

浏览文本文件中的“目录”结构

1 个答案: