Question

我正在尝试使用sed（在CentOS上的shell脚本的一部分）删除HTML文件中括号后的尾随空格：

来自：

<p>Some text (
<em>Text which should not break to a new line</em>). More text.</p>

到此：

<p>Some text (<em>Text which should not break to a new line</em>). More text.</p>

我可以使用\(\s REGEX在Sublime Text中轻松完成，并用括号替换它，但这在sed中不起作用。

我试过了：

sed 's/[(]\s*$/(/'
sed 's/[(]\s*$\n/(/'

以及许多其他事情，但它们都不起作用。

有什么想法吗？

Answer 1

尝试：

sed ':a;/($/{N;s/\n//;ba}' file

如果该行以(结尾，则将下一行（N）附加到模式空间，然后将换行符\n替换为空，从而加入这些行。这是在循环中完成的（ba跳回标签a）。

Answer 2

我愿意：

awk 'sub(/\(\s*$/,"("){printf "%s",$0;next}7' file

带/不带尾随空格/标签的示例：

kent$  cat f
foo [with trailing spaces](     
)foo end
bar [with trailing spaces & tab](               
)bar end
blah no trailing spaces(
)

只是为了显示尾随空格：

kent$  sed 's/$/|/' f
foo [with trailing spaces](     |
)foo end|
bar [with trailing spaces & tab](               |
)bar end|
blah no trailing spaces(|
)|

用我的awk oneliner测试：

kent$  awk 'sub(/\(\s*$/,"("){printf "%s",$0;next}7' f
foo [with trailing spaces]()foo end
bar [with trailing spaces & tab]()bar end
blah no trailing spaces()

Answer 3

曾经有过同样的问题。 tr是前往此处而不是sed的方式：

cat textfile.ext | tr -d '\n'

将删除文件的所有换行符（-d）。或者您甚至可以先使用grep过滤掉相关行，例如

cat textfile.ext | grep -A1 '^<p>Some text' | tr -d '\n'

选项-A1代表n行后使用正则表达式'^<p>...缓存的行。有关更详细的说明，请参阅man grep。

编辑：在您的特殊情况下，grep命令应该更像是这样：grep -A1 '($'，它使用以下行过滤所有开头父项（见上文）。< / p>

sed删除括号后的尾随空格

3 个答案: