Question

我有一个看起来像这样的成绩单的.txt文件

MICHEAL: blablablabla.

further talk by Michael.

more talk by Michael.

VALERIE: blublublublu.

Valerie talks more.

MICHAEL: blibliblibli.

Michael talks again.

........

总而言之，这种模式最多可延续4000行而不仅仅是两个扬声器，但最多有七个不同的扬声器，所有扬声器都使用大写字母（如上例所示）。对于某些文本挖掘，我需要按以下方式重新排列此.txt文件

加入一个发言者后面的行 - 但只有那些仍然属于他的人 - 以便上面的文件如下所示：

MICHAEL: blablablabla. further talk by Michael. more talk by Michael.

VALERIE: blublublublu. Valerie talks more.

MICHAEL: blibliblibli. Michael talks again.

按字母顺序对.txt文件中正确连接的行进行排序，以便扬声器说出的所有行现在都在一起。但是，排序功能不应该对一个发言者所说的句子进行排序（在将每个发言者排在一起之后）。

我知道一些基本的vim命令，但还不足以解决这个问题。特别是第一个。我不知道我可以在vim中实现什么样的模式，以便它只加入每个发言者的行。

任何帮助都会得到很大的帮助！

Answer 1

好的，首先回答：

:g/^\u\+:/,/\n\u\+:\|\%$/join

现在的解释是：

g 代表全局，并在匹配
/ ^ \ u +：/ 是模式：g 搜索：^ 是行首， \ u < / strong>是大写字母， + 表示一个或多个匹配，：不出所料：

然后是棘手的一点，我们使执行的命令成为一个范围，从匹配中得到一些其他模式匹配。 / \ n \ u +：\ | \％$ 是管道 \ |分开的两部分。 \ n \ u +：是一个新行，后跟最后一个模式，即下一个发言者之前的行。 \％$ 是文件的结尾

join做了它在锡上的说法

所以把它放在一起：对于每个发言者，加入到下一个发言者之前的行或文件的结尾。

我最接近排序的是

：sort / \ u +：/ r

只会按说话人姓名排序并反转另一行，所以它并不是你想要的

Answer 2

在vim中，您可以采用两步方法，首先替换所有换行符。

:%s/\n\+/ /g

然后在术语UPPERCASE:之前插入一个新行，但第一行除外：

:%s/ \([[:upper:]]\+:\)/\r\1/g

对于排序，您可以利用UNIX排序程序：

:%sort!

您可以使用竖线符号组合它们：

:%s/\n\+/ /g | %s/ \([[:upper:]]\+:\)/\r\1/g | %!sort

并将它们映射到vimrc文件中的一个键：

:nnoremap <F5> :%s/\n\+/ /g \| %s/ \([[:upper:]]\+:\)/\r\1/g \| %sort! <CR>

如果在正常模式下按 F5 ，则会发生转换。请注意，|需要在nnoremap命令中进行转义。

Answer 3

我对vim了解不多，但是我想要匹配特定发言者的行，这里是regex。

正则表达式： /([A-Z]+:)([A-Za-z\s\.]+)(?!\1)$/gm

说明：
([A-Z]+:)捕获只包含大写字母的发言人姓名。

([A-Za-z\s\.]+)抓住对话。

(?!\1)$反对说话者姓名，并比较下一位发言者是否与最后一位发言者相同。如果没有，那么它会匹配，直到找到新的发言者。

我希望这至少可以帮助你进行匹配。

Answer 4

以下是您的问题的脚本解决方案。

它没有经过良好测试，因此我添加了一些注释，以便您轻松修复。

要让它运行，只需：

使用您需要的大写名称填充脚本顶部的g:speakers var;
获取脚本（例如：:sav /tmp/script.vim|so %）;
通过发言人运行:call JoinAllSpeakLines()加入线路;
运行:call SortSpeakLines()进行排序

您可以调整不同的模式以更好地满足您的需求，例如添加一些空间容差（\u\{2,}\s*\ze:）。

以下是代码：

" Fill the following array with all the speakers names:
let g:speakers = [ 'MICHAEL', 'VALERIE', 'MATHIEU' ]
call sort(g:speakers)


function! JoinAllSpeakLines()
" In the whole file, join all the lines between two uppercase speaker names 
" followed by ':', first inclusive:
    silent g/\u\{2,}:/call JoinSpeakLines__()
endf

function! SortSpeakLines()
" Sort the whole file by speaker, keeping the order for
" each speaker.
" Must be called after JoinAllSpeakLines().

    " Create a new dict, with one key for each speaker:
    let speakerlines = {}
    for speaker in g:speakers
        let speakerlines[speaker] = []
    endfor

    " For each line in the file:
    for line in getline(1,'$')
        let speaker = GetSpeaker__(line)
        if speaker == ''
            continue
        endif
        " Add the line to the right speaker:
        call add(speakerlines[speaker], line)
    endfor

    " Delete everything in the current buffer:
    normal gg"_dG

    " Add the sorted lines, speaker by speaker:
    for speaker in g:speakers
        call append(line('$'), speakerlines[speaker])
    endfor

    " Delete the first (empty) line in the buffer:
    normal gg"_dd
endf

function! GetOtherSpeakerPattern__(speaker)
" Returns a pattern which matches all speaker names, except the
" one given as a parameter.
    " Create an new list with a:speaker removed:
    let others = copy(g:speakers)
    let idx = index(others, a:speaker)
    if idx != -1
        call remove(others, idx)
    endif
    " Create and return the pattern list, which looks like
    " this : "\v<MICHAEL>|<VALERIE>..."
    call map(others, 'printf("<%s>:",v:val)')
    return '\v' . join(others, '|')
endf

function! GetSpeaker__(line)
" Returns the uppercase name followed by a ':' in a line
    return matchstr(a:line, '\u\{2,}\ze:')
endf

function! JoinSpeakLines__()
" When cursor is on a line with an uppercase name, join all the
" following lines until another uppercase name.
    let speaker = GetSpeaker__(getline('.'))
    if speaker == ''
        return
    endif
    normal V
    " Search for other names after the cursor line:
    let srch = search(GetOtherSpeakerPattern__(speaker), 'W')
    echo srch
    if srch == 0
        " For the last one only:
        normal GJ
    else
        normal kJ
    endif
endf

在特定单词之后加入行直到另一个特定单词

4 个答案: