Question

我想在html源文件中插入一个标记（到html）其他未知长度的文本文件，但总是至少有两行。我打算使用m4，但“include”读取整个文件AFAIK。所以，到sed ......

一旦找到指示插入点开始的模式，第一行将附加到<div class=...>标记，第二行同样（但不同的类），然后循环直到EOF，然后输出源文件的其余部分。

查找插入点是正常的，打印源文件的其余部分也是如此。我遇到了sed循环读取文本文件的问题，直到完成为止。

示例输入

title1
author1
title2
author2
...
titleN
authorN

期望的输出

<!-- above here is source file, below is sed'ed output -->
<div class="title">
title1
</div>
<div class="author">
author1
</div>
<div class="title">
title2
</div>
<div class="author">
author2
</div>
...
<div class="title">
titleN
</div>
<div class="author">
authorN
</div>
<!-- below is rest of source file -->

我不太关心换行符，一行都很好，例子只是为了清楚说明发生了什么。 `

我可以使用a \ <div ....和R filename以及两个或四行输入的简单情况使其正常工作。一旦我尝试使用循环来处理可变数目的输入行的情况，我就失败了。

我尝试使用虚拟替换s|^$.+$|\1|，因此我可以使用T对其进行测试，如果模式匹配为空，则退出，但不起作用。我的另一次尝试导致sed进入无限循环。

如何测试R是成功还是失败？我在这里缺少一种设计模式吗？

（我使用的是GNU sed，所以R和T都可以。）

感谢。

Answer 1

不要认为sed只是作为循环线的语言。您可以通过将第一行和最后一行匹配为一系列行来指定一系列行：

sed '/firstRE/,/secondRE/s/ThingsBetweenLines/ReplaceWithThis/'

例如：

[ghoti@pc ~]$ printf 'one\ntwo\nthree\nfour\nfive\n' | sed '/two/,/four/s/[ore]/_/g'
one
tw_
th___
f_u_
five
[ghoti@pc ~]$

问题在于sed并不擅长插入整个LINES，并且sed实际上没有办法说“当前行号是偶数/奇数”。多行的东西是神秘而丑陋的。如果我记得的话，Gnu sed确实有一些多行符号，但它已经很晚了，我永远不会记得如何使用非标准的东西。

所以我推荐awk。 :)它的代码更容易阅读，它更适合这类任务。

awk '
  BEGIN {
    fmt="<div class=\"title\">%s</div>\n<div class=\"author\">%s</div>\n";
  }
  {
    title=$0; getline; author=$0;
    printf(fmt, title, author);
  }
'

当然，您也可以在纯shell中执行此操作：

#!/bin/sh

fmt="<div class=\"title\">%s</div>\n<div class=\"author\">%s</div>\n"

while read line; do
  if [ -z "$title" ]; then
    title="$line"
    continue
  fi
  author="$line"
  printf "$fmt" "$title" "$author"
  title=''
done

看，它对我有用：

[ghoti@pc ~/tmp]$ printf 'title1\nauthor1\ntitle2\nauthor2\n' | ./doit
<div class="title">title1</div>
<div class="author">author1</div>
<div class="title">title2</div>
<div class="author">author2</div>
[ghoti@pc ~/tmp]$ printf 'title1\nauthor1\ntitle2\nauthor2\n' | ./doit.awk
<div class="title">title1</div>
<div class="author">author1</div>
<div class="title">title2</div>
<div class="author">author2</div>
[ghoti@pc ~/tmp]$

Answer 2

这可能适合你（GNU sed）：

cat <<! >couplet.sed
N;s/\(.*\)\n\(.*\)/<div class="title">\1<\/div><div class="author">\2<\/div>/
!
sed '/^<!-- below is rest of source file -->/e sed -f couplet.sed data' source
!-- above here is source file, below is sed'ed output -->
<div class="title">title1</div><div class="author">author1</div>
<div class="title">title2</div><div class="author">author2</div>
...
<div class="title">titleN</div><div class="author">authorN</div>
<!-- below is rest of source file -->

需要的是sed命令中的sed程序。这是使用e命令实现的。

N.B。 sed程序可以用任何bash命令/脚本/等替换。

说明：

创建一个sed脚本，一次读取数据文件2行并生成所需的div类
读取源文件，直到插入点，然后运行上面的脚本。 e命令将针对数据文件的couplet.sed运行结果的输出插入到sed oneliner的输出中。

e命令可以通过三种方式运行：

作为s命令的标志。它评估RHS s/PATTERN/COMMAND/e
作为插入输出流的独立命令，例如1e date
没有参数，它会评估模式空间中的任何内容。

另一种解决方案：

sed -e 'N;s/\(.*\)\n\(.*\)/\/^<!-- below is rest of source file -->\/i\\<div class="title">\1<\/div><div class="author">\2<\/div>/' data |
sed -f - source

Answer 3

您有两个输入文件。一个包括：

some text
insertion point pattern
rest of the text

加上第二个文件中交替的标题和作者行的列表。

输出应为：

some text
insertion point pattern
...alternating list of title and author <div>s
rest of the text

我认为解决这个问题的最简单方法是：

处理标题/作者列表（从title.authors文件）到临时文件。
让sed在插入点读取临时文件。

这转换为大纲：

tmp=${TMPDIR:-/tmp}/at.$$     # Or use mktemp command
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15

sed -e 'N' \
    -e 's%\(.*\)\n\(.*\)%<div class="title">\1</div>\n<div class="author">\2</div>%' \
    title.authors > $tmp

sed "/insertion point pattern/r $tmp" main-file > output-file

rm -f $tmp
trap 0

trap命令的详细信息可确保脚本在发送HUP，INT，QUIT，PIPE或TERM信号后自行清理。

第一个sed脚本使用N来组合相邻的行，因此它在模式空间中的两行上提供标题和作者。然后另一行将换行符两侧的素材收集到\1和\2，然后将其标记。

第二个sed脚本标识插入点，打印该行，读取标题和作者的预处理文件（注意双引号以允许shell展开$tmp），然后再读取下一个线。

需要临时文件是一种轻微的麻烦，但这样做干净地分离了“格式化标题和作者信息”和“将格式化的标题和作者信息复制到数据流中的正确位置”的不同职责。 / p>

如果您需要输出中的标记HTML / XML注释，则可以使预处理脚本复杂化：

   -e '1i\
      <!-- above here is source file, below is sed'ed output -->' \
   -e '$a\
      <!-- below is rest of source file -->'

请注意，前导空格将包含在输出中。如果这很重要，请将整个第一个脚本放入文件（title-author.sed）并使用sed -f title-author.sed title.authors > $tmp预处理信息：

标题-author.sed

1i\
<!-- above here is source file, below is sed'ed output -->
$a\
<!-- below is rest of source file -->
N
s%\(.*\)\n\(.*\)%<div class="title">\1</div>\n<div class="author">\2</div>%

这方面的缺点是额外的文件 - sed脚本。当然，您可以将其作为另一个临时文件动态生成。我的诀窍是使用：

tmp=${TMPDIR:-/tmp}/at.$$
trap "rm -f $tmp.?; exit 1" 0 1 2 3 13 15

cat > $tmp.1 <<'EOF'
1i\
<!-- above here is source file, below is sed'ed output -->
$a\
<!-- below is rest of source file -->
N
s%\(.*\)\n\(.*\)%<div class="title">\1</div>\n<div class="author">\2</div>%
EOF

sed -f $tmp.1 title.authors > $tmp.2

sed "/insertion point pattern/r $tmp.2" main-file > output-file

rm -f $tmp.?
trap 0

更改是使用生成的临时名称作为前缀，实际的临时文件是$tmp.1，$tmp.2。清理只是略有不同，以反映可能有多个临时文件要删除。

显然，您可以安排两个输入文件作为脚本的参数，只需将脚本写入标准输出，以便您可以将其输出重定向到任何您想要的位置，而不是将其强制转换为output-file 。事实上，通用脚本应该这样做。

Answer 4

那不是sed的工作，它是awk的工作：

awk 'NR==FNR{a[NR]=$0; next} {print} /<div class=/{print a[++c]}' file1.txt file2.html

sed如何读入和处理未知长度的文件

4 个答案:

标题-author.sed