Question

我不知道这样做的好方法（参见/ awk / perl）;我结合了多个html文件章节，它具有以下结构

 <a href="#chapter11">title</a>
 <a href="#chapter12">title</a>
 <a href="#chapter13">title</a>
 <p>first chapter contents, multiple
 pages</p>
 <a href="#chapter21">title</a>
 <a href="#chapter22">title</a>
 <a href="#chapter23">title</a>
 <p>Second chapter contents, multiple pages
 more informations</p>
 <a href="#chapter31">title</a>
 <a href="#chapter32">title</a>
 <a href="#chapter33">title</a>
 <p>Third chapter contents, multiple pages
 few more details</p>

我希望他们像下面那样进行重组

 <a href="#chapter11">title</a>
 <a href="#chapter12">title</a>
 <a href="#chapter13">title</a>
 <a href="#chapter21">title</a>
 <a href="#chapter22">title</a>
 <a href="#chapter23">title</a>
 <a href="#chapter31">title</a>
 <a href="#chapter32">title</a>
 <a href="#chapter33">title</a>
 <p>first chapter contents, multiple
 pages</p>
 <p>Second chapter contents, multiple pages
 more informations</p>
 <p>Third chapter contents, multiple pages
 few more details</p>

我在html中有五章重新组织它们。我试图采用sed保持缓冲区，但据我所知，这似乎很难。我不限于sed或awk。任何帮助都将受到高度赞赏，谢谢。

修改

抱歉改变了源文件，它也有几行并不总是以

开头

  <a or <p

无论如何都有像sed中的反向选择这样的脚本，比如

 /^<a!/p/

Answer 1

如何两次运行sed，首先输出<a>标签，然后输出<p>标签：

sed -n '/^<a/p' input.txt
sed -n '/^<p/p' input.txt

使用holdspace可以这样做：

sed -n '/^<a/p; /^<p/H; ${g; s/\n//; p}' input.txt

打印所有<a>标记，将所有<p>标记放入文档末尾的保留空间（$），获取保留空间并打印。 H在添加到保留空间之前总是添加换行符，我们不想要的第一个换行符，这就是我们使用s/\n//删除它的原因。

如果要存储输出，可以重定向

sed -n '/^<a/p; /^<p/H; ${g; s/\n//; p}' input.txt > output.txt

要直接使用sed -i，我们需要稍微重新构建代码：

sed -i '${x; G; s/\n//; p}; /^<p/{H;d}' input.txt

但这有点乏味。

如果您有以其他字符开头的行，并且只想将所有以<a>标记开头的行移到前面，您可以

sed -n '/^<a/p; /^<a/! H; ${g; s/\n//; p}' input.txt

Answer 2

Grep也有效：

(grep -F '<a' test.txt ; grep -F '<p' test.txt)

Answer 3

sed -n '/^ *<[aA]/ !H
/^ *<[aA]/ p
$ {x;s/\n//;p;}
' YourFile

如果＆lt; a href =“＃章更准确（并且还允许上限和小变化）在行的开头不存在，请将其保留在缓冲区中。

如果有，请打印内容

最后，加载缓冲区，删除第一个新行（我们从追加开始，因此首先保留newx行）并打印内容

Answer 4

使用awk

awk '{if ($0~/<a/) a[NR]=$0; else b[NR]=$0} END {for (i=1;i<=NR;i++) if (a[i]) print a[i];for (j=1;j<=NR;j++) if (b[j]) print b[j]}' file
 <a href="#chapter11">title</a>
 <a href="#chapter12">title</a>
 <a href="#chapter13">title</a>
 <a href="#chapter21">title</a>
 <a href="#chapter22">title</a>
 <a href="#chapter23">title</a>
 <a href="#chapter31">title</a>
 <a href="#chapter32">title</a>
 <a href="#chapter33">title</a>
 <p>first chapter contents, multiple
 pages</p>
 <p>Second chapter contents, multiple pages
 more informations</p>
 <p>Third chapter contents, multiple pages
 few more details</p>

保持缓冲区以重新排列文本

4 个答案: