Question

我有一个文件file1.txt，就像这样：

This is some text.
This is some more text. ② This is a note.
This is yet some more text.

我需要删除“②”之后出现的任何文字，包括“②”以及之前出现的任何单个空格（如果有这样的空格）。例如，上述文件将变为file2.txt：

This is some text.
This is some more text.
This is yet some more text.

如何删除“②”，后面的任何内容以及任何前面的单个空格？

How can I remove all text after a character in bash?的解决方案似乎不起作用，也许是因为“②”不是普通人物。
文件以UTF-8保存。

Answer 1

Perl解决方案：

$ perl -CS -i~ -p -E's/ ②.*//' file1.txt

你最终会得到file1.txt中的正确数据和file1.txt~中原始文件的备份。

Answer 2

我希望您确实意识到大多数unix实用程序不能与unicode一起使用。我假设你的输入是UTF-8，如果不是你必须相应调整。

#!/bin/bash
function px {
 local a="$@"
 local i=0
 while [ $i -lt ${#a}  ]
  do
   printf \\x${a:$i:2}
   i=$(($i+2))
  done
}
(iconv -f UTF8 -t UTF16 | od -x |  cut -b 9- | xargs -n 1) |
if read utf16header
then
 echo -e $utf16header
 out=''
 while read line
  do
   if [ "$line" == "000a" ]
    then
     out="$out $line"
     echo -e $out
     out=''
   else
    out="$out $line"
   fi
  done
 if [ "$out" != '' ] ; then
   echo -e $out
 fi
fi |
 (perl -pe 's/( 0020)* 2461 .*$/ 000a/;s/ *//g') |
 while read line
  do
    px $line
  done | (iconv -f UTF16 -t UTF8 )

Answer 3

sed -e "s/[[:space:]]②[^\.]*\.//"

但是，我不确定②符号是否正确解析。也许你必须使用UTF8代码或类似的东西。

Answer 4

试试这个：

sed -e '/②/ s/[ ]*②.*$//'

/②/仅查看包含魔术符号的行;
[ ]*表示魔术符号前的任何数字（无匹配）;
.*$其他一切直到行尾。

如何删除特定符号后出现的行上的所有文本？

4 个答案: