Question

我有一个文件如下：

>abc
AAA
AAA
>dfgg
BBBBB
BBBBB
BB
>zzz
CCCCC
CCC

我想要的输出是：

>abc
AAAAAA
>dfgg
BBBBBBBBBBBB
>zzz
CCCCCCCC

即将多行转换为一行。

我写了以下代码：

f = open('test.txt', 'r')
currentline = ""
for line in f:
    if line.startswith('>'):
        line = line.rstrip('\n')
        print line
    else:
        line = line.rstrip('\n')
        currentline = currentline + line
        print currentline
f.close()

当然，这是不对的，因为currentline一直持续增长直到结束。我无法弄清楚如何更新currentline并按指示打印输出。

我知道一个选项是使用f.read()或f.readlines()读取整个文件，并将该文件视为字符串或列表，但因为文件非常大而且每行都不以'＆GT;'最多可以达到2000万个字符，我认为最好不要立即将整个文件读入内存并逐行处理。请让我知道你对此的看法。

感谢您的帮助！

Answer 1

天真的解决方案：

from itertools import groupby

with open('data.txt') as f:
    for key, group in groupby(f, lambda s: s.startswith('>')):
        print(''.join(s.rstrip('\n') for s in group))

这仅适用于以>开头的行都是单行的情况，它们在您的示例中。为了避免连接那些你可以做的事情：

from itertools import groupby, count

counter = count()
with open('data.txt') as f:
    for key, group in groupby(f, lambda s: next(counter) if s.startswith('>') else -1):
        print(''.join(s.rstrip('\n') for s in group))

关键是groupby的关键功能：count()是一个生成器，只生成一个整数0,1,2的序列。这意味着每个>行得到它自己的唯一键，而所有其他行都获得-1的键，除非>行干预，否则它们会组合在一起。

事实上，可以使用任何保持组唯一的表达式，它不必是计数器。例如，你可以使用它：

lambda s: object() if s.startswith('>') else None

文件迭代和groupby都是惰性的，因此只要读取了组之后的行，就会输出组。

Answer 2

一个版本在出现后立即打印出来：

with open('test.txt', 'r') as f:
    flush = False
    for line in f:
        if line.startswith('>'):
            if flush:
                print('')
            print(line.rstrip('\n'))
            flush = False
        else:
            flush = True
            print(line.rstrip('\n'), end='')
    if flush:
        print('')

Answer 3

试试这个

//before each route change, check if the user is logged in
//and authorized to move onto the next route
$rootScope.$on('$routeChangeStart', function (event, next, prev) {
    if (next !== undefined) {
        if ('data' in next) {
            if ('authorizedRoles' in next.data) {
                var authorizedRoles = next.data.authorizedRoles;
                if (!SessionService.isAuthorized(authorizedRoles)) {
                    event.preventDefault();
                    SessionService.setRedirectOnLogin(BuildPathFromRoute(next));
                    if (SessionService.isLoggedIn()) {
                        // user is not allowed
                        $rootScope.$broadcast(AUTH_EVENTS.notAuthorized);
                    } else {
                        // user is not logged in
                        $rootScope.$broadcast(AUTH_EVENTS.notAuthenticated);
                    }
                }   
            }
        }
    }
});

Answer 4

您的代码很好，您需要做的就是找到更新currentline的正确位置。您会在找到下一个标志后更新，在您的情况下，该标志是以>开头的行。

f = open('test.txt', 'r')
currentline = ""
for line in f:
    if line.startswith('>'):
        line = line.rstrip('\n')
        if currentline != "": print currentline
        print line
        currentline = ""
    else:
        line = line.rstrip('\n')
        currentline = currentline + line
print currentline
f.close()


Input:

>abc
AAA
AAA
>dfgg
BBBBB
BBBBB
BB
>zzz
CCCCC
CCC

Output:

>abc
AAAAAA
>dfgg
BBBBBBBBBBBB
>zzz
CCCCCCCC

# edited code above and tested it with the below file based on ypnos's comment.
Input:

>abc
AAA
AAA
>dfgg
BBBBB
BBBBB
BB
>
>
>>
>zzz
CCCCC
CCC

Output:

>abc
AAAAAA
>dfgg
BBBBBBBBBBBB
>
>
>>
>zzz
CCCCCCCC

编辑：ypnos提出了一个很好的观点，即上面会打印不必要的换行符。我对上面的代码做了一些小改动，现在避免打印。请参阅上面的新测试用例。

Python中的多行成单行

4 个答案: