Question

我有一系列带有yaml标题的文件，后跟markdown字幕，看起来像这样：

最小示例输入文件：

---
layout: post
tags: 
  - might 
  - be
  - variable 
  - number 
  - of 
  - these
category: ecology
---



my (h2 size) title
------------------

some text


possible other titles we don't want
-----------------------------------

more text more text

正如我试图指出的那样，YAML标题的大小和第一个字幕出现的行会有所不同，所以我不能指望提前知道任何变化的行号。我想确定第一个标题（也应该是关闭后---之后的第一个非空白文本。然后我想将该文本写入YAML标题，就像这样，我们抓取的图块被删除从正文中，文本的其余部分保持不变：

目标输出文件

---
layout: post
tags: 
  - might 
  - be
  - variable 
  - number 
  - of 
  - these
categories: ecology
title: my (h2 size) title
---



some text

possible other titles we don't want
-----------------------------------

more text more text

对于sed / awk等，这似乎应该是一个合理的任务，但我对这些工具的使用是非常基础的，我无法解决这个问题。

我发现我可以在单词sed 'word1/,/word2/p之间进行搜索，但不知道如何将其转换为在第二次出现^---$和第一次出现^----+-$之间进行搜索超过3个破折号）;然后如何删除额外的空白行，然后粘贴到上面的yaml内容中。

也许有这么多步骤perl比sed更好的选择，但是我对它的熟悉度更低。感谢任何提示或建议。

Answer 1

只做2次传递 - 第一次（当NR == FNR时）找到你想要打印的标题和行号，第二次打印它，当行号适当时，其他行：

$ cat tst.awk
NR==FNR {
   if (hdrEnd && !title && NF)  {title = $0; titleStart=FNR; titleEnd=FNR+1 }
   if (hdrStart && /^---$/)     {hdrEnd   = FNR }
   if (!hdrStart && /^---$/)    {hdrStart = FNR }
   next
}
FNR == hdrEnd { print "title:", title }
(FNR < titleStart) || (FNR > titleEnd)

$ awk -f tst.awk file file      
---
layout: post
tags: 
  - might 
  - be
  - variable 
  - number 
  - of 
  - these
category: ecology
title: my (h2 size) title
---




some text


possible other titles we don't want
-----------------------------------

more text more text

hdrStart是标题开头的行号等。如果您想跳过标题周围的更多行而不仅仅是文本和后续的下划线，只需更改titleStart和titleEnd填充到FNR-1和FNR +的方式2或其他什么。 FNR（文件记录数）是当前打开文件中的当前行号，而NR（记录数）是迄今为止在所有先前和当前打开的文件中读取的行数。

如果您不想在命令行上指定文件名两次，可以在awks BEGIN部分复制它：

$ cat tst.awk             
BEGIN{ ARGV[ARGC++] = ARGV[ARGC-1] }
NR==FNR {
   if (hdrEnd && !title && NF)  {title = $0; titleStart=FNR; titleEnd=FNR+1 }
   if (hdrStart && /^---$/)     {hdrEnd   = FNR }
   if (!hdrStart && /^---$/)    {hdrStart = FNR }
   next
}
FNR == hdrEnd { print "title:", title }
(FNR < titleStart) || (FNR > titleEnd)

然后你只需要调用脚本：

$ awk -f tst.awk file

编辑：实际上 - 这是一个不采用2遍方法的替代方案，可以说更简单：

$ cat tst.awk
(state == 0) && /^---$/ { state=1; print; next }
(state == 1) && /^---$/ { state=2; next }
(state == 2) && /^./    { state=3; printf "title: %s\n---\n",$0; next }
(state == 3) && /^-+$/  { state=4; next }

state != 2 { print }

$ awk -f tst.awk file
---
layout: post
tags: 
  - might 
  - be
  - variable 
  - number 
  - of 
  - these
category: ecology
title: my (h2 size) title
---

some text


possible other titles we don't want
-----------------------------------

more text more text

如果您熟悉状态机，那么它应该是显而易见的，如果不让我知道的话。

Answer 2

快速而肮脏的perl代码：

$/=undef;  # null line delimiter, so that the following reads the full file
my $all=<STDIN>;
my @parts=split(/^(----*)$/m,$all); # split in sections delimited by all-dashes linse
my @head=split("\n",$parts[2]);  # split the header in lines
my @tit=split("\n",$parts[4]);  # split the title section in lines
push @head,pop @tit;            # remove the last line from the title section and append to head
$parts[2]=join("\n",@head)."\n"; # rebuild the header
$parts[4]=join("\n",@tit);       # rebuild the title section
print join("",@parts);           # rebuild all and print to stdout

这对你来说可能不够健壮：它不关心是否有3个或更多的破折号，它假设是UNIX换行符，不检查标题是非空白等等。这可能是有用的起点，或者如果你只需要运行一次。另一种方法可能是读取数组内存中的所有行，循环显示分隔符行并移动标题行。

Answer 3

也许这个Perl代码可以帮助您找到解决方案：

#!/usr/bin/env perl

use Modern::Perl;
use File::Slurp;

my @file_content = read_file('test.yml');
my ($start, $stop, $title);
foreach my $line (@file_content) {

    if ($line =~ m{ --- }xms) {
        if (!$start) {
            $start = 1;
        }
        else {
            $stop = 1;
            next;
        }
    }    

    if ($line && $stop && $line = m{\w}xms) {
        $title = $line;
        last;
    }


}

say "Title: $title";

使用上述数据输出 标题：我的（h2大小）标题

Answer 4

好老蟒蛇：

with open("i.yaml") as fp:
    lines = fp.readlines()

c = False
i = 0
target = -1

for line in lines:
    i += 1
    if c:
        if line.strip() != "":
            source = i - 1
            c = False

    if line.strip() == "---":
        if i > 1:
            c = True
            target = i - 1

lines[target:target] = ["title: " + lines[source]]
del lines[source + 1]
del lines[source + 1]

with open("o.yaml", "w") as fp:
    fp.writelines(lines)

使用sed（或awk，perl等）来识别降价标题的第一次出现

最小示例输入文件：

目标输出文件

4 个答案: