Question

我有一个文件，其中包含以下格式的数据：

saveAs

我想将每n行分别视为一个元素。因此，在每行仅包含League\Flysystem\WebDAV\之后，我要处理以下字符串和url（url的数量有所不同）。我用第一个字符串的名称创建一个文件夹，然后从URL下载文件。

我正在使用以下行来获取行列表。

Foo
http://url.com
http://url2.com

FooBar
http://url3.com

FooBarBar
http://url9.com

现在我正在考虑将其放在列表列表中，其中\n用作分隔符元素。

我如何实现自己想要的？

Answer 1

这种情况下您不应该一行，因为您没有关闭文件：

with open('C:\\filename.txt', 'r') as f:

    result = [] # This will keep track of the final output
    entry = [] # This will be our temporary entry that we will add to the result

    for line in f.readlines():
        line = line.strip() # remove the new line stuff
        if not line and entry: # If it is not an empty line and our entry actually has stuff in it
            result.append(' '.join(entry))
            entry = []
        else:
            entry.append(line)
    if entry:
        result.append(' '.join(entry)) # Add the last entry.

print(result)

输出：

['Foo http://url.com http://url2.com', ' FooBar http://url3.com', 'FooBarBar http://url9.com']

Answer 2

您应该能够遍历文件中的各行，并分别处理每种情况。

def urlsFromFile(path):
    files = {}
    with open(path) as f:  # Important to use with here to ensure file is closed after reading
        fileName = None
        for line in f.readlines():
            line = line.rstrip('\n')  # Remove \n from end of line
            if not line:  # If the line is empty reset the fileName
                fileName = None
            elif fileName is None:  # If fileName is None, then we previously reached a new line. Set the new fileName
                fileName = line
                files[fileName] = []
            else:  # We are working through the urls
                files[fileName].append(line)
    return files

print(urlsFromFile('filename.txt'))

输出：

{'FooBar': ['http://url3.com'], 'Foo': ['http://url.com', 'http://url2.com'], 'FooBarBar': ['http://url9.com']}

这将允许您使用结果来创建目录并下载每个列表中的文件，例如：

for folder, urls in urlsFromFile('filename.txt').items():
    print('create folder {}'.format(folder))
    for url in urls:
        print('download {} to folder {}'.format(url, folder))

输出：

create folder FooBar
download http://url3.com to folder FooBar
create folder Foo
download http://url.com to folder Foo
download http://url2.com to folder Foo
create folder FooBarBar
download http://url9.com to folder FooBarBar

Answer 3

如果格式一致，那么您可以简单地读取整个文件，然后根据需要拆分字符串。

代码

with open('C:\\filename.txt') as fobj:
    elements = [block.split('\n') for block in fobj.read().split('\n\n')]

元素：=

[['Foo', 'http://url.com', 'http://url2.com'],
['FooBar', 'http://url3.com'],
['FooBarBar', 'http://url9.com']]

EXPLANATION

我总是建议使用上下文管理器（with语句）。因为在数据流处理方面更安全。

我们在这里分为三层：

fobj.read()
文件的全部内容被缓冲成一个字符串。
.split('\n\n')
前一个字符串由两个连续换行符的分隔符分隔，从而形成文本块列表。
block.split('\n')
这些块中的每一个都分成几行。

Answer 4

根据要求的迭代方法“使用第一个字符串的名称创建一个文件夹，然后从url下载文件。” ：

import os

with open('input.txt') as f:
    folder_name = None
    folder_failed = False

    for line in f:
        line = line.strip()
        if line:
            if not line.startswith('http'):
                try:
                    os.mkdir(os.path.join(os.getcwd(), line))
                    folder_name = line
                except OSError:
                    print(f"Creation of the directory `{line}` failed")
                    folder_failed = True
                else:
                    folder_failed = False
            elif not folder_failed:
                # downloading file
                new_file = download_file_from_url(line)  # replace with your custom function
                # save file into a folder `folder_name`

使用分隔符将列表拆分为数组

4 个答案: