Question

我有一个相当大的SQL文件，它以FFFE的字节顺序标记开头。我使用unicode感知的linux拆分工具将此文件拆分为100,000行块。但是当它们传回窗口时，它不就像第一个部分之外的任何部分一样只有它有FFFE字节顺序标记。

如何使用echo（或任何其他bash命令）添加这两个字节的代码？

Answer 1

基于sed的solution of Anonymous，sed -i '1s/^/\xef\xbb\xbf/' foo将BOM添加到UTF-8编码文件foo。有用的是，它还将ASCII文件转换为带有BOM

的UTF8

Answer 2

要将BOM添加到以“foo-”开头的所有文件，您可以使用sed。 sed可以选择进行备份。

sed -i '1s/^\(\xff\xfe\)\?/\xff\xfe/' foo-*

strace这表明sed会创建一个名为“sed”的临时文件。如果您确定已经没有BOM，则可以简化命令：

sed -i '1s/^/\xff\xfe/' foo-*

确保您需要设置UTF-16，因为即UTF-8不同。

Answer 3

对于通用解决方案 - 设置正确的字节顺序标记，无论文件是UTF-8，UTF-16还是UTF-32，我都会使用vim的'bomb'选项：< / p>

$ echo 'hello' > foo
$ xxd < foo
0000000: 6865 6c6c 6f0a                           hello.
$ vim -e -s -c ':set bomb' -c ':wq' foo
$ xxd < foo
0000000: efbb bf68 656c 6c6f 0a                   ...hello.

（-e表示以ex模式而非可视模式运行; -s表示不打印状态讯息; -c表示“执行此操作”）

Answer 4

像（先备份））：

for i in $(ls *.sql)
do
  cp "$i" "$i.temp"
  printf '\xFF\xFE' > "$i"
  cat "$i.temp" >> "$i"
  rm "$i.temp"
done

Answer 5

尝试使用uconv

uconv --add-signature

Answer 6

Matthew Flaschen的答案很好，但它有一些缺陷。

在原始文件被截断之前，没有检查副本是否成功。最好使所有内容都依赖于成功的副本，或者测试是否存在临时文件，或者进行操作在副本上。如果你是一个腰带和吊带的人，你会做一个组合，如下图所示
ls是不必要的。
我使用的变量名比“i”更好 - 也许是“文件”。

当然，您可能非常偏执，并在开头检查是否存在临时文件，因此您不会意外覆盖它和/或使用UUID或生成的文件名。 mktemp，tempfile或uuidgen中的一个可以解决问题。

td=TMPDIR
export TMPDIR=

usertemp=~/temp            # set this to use a temp directory on the same filesystem
                           # you could use ./temp to ensure that it's one the same one
                           # you can use mktemp -d to create the dir instead of mkdir

if [[ ! -d $usertemp ]]    # if this user temp directory doesn't exist
then                       # then create it, unless you can't 
    mkdir $usertemp || export TMPDIR=$td    # if you can't create it and TMPDIR is/was
fi                                          # empty then mktemp automatically falls
                                            # back to /tmp

for file in *.sql
do
    # TMPDIR if set overrides the argument to -p
    temp=$(mktemp -p $usertemp) || { echo "$0: Unable to create temp file."; exit 1; }

    { printf '\xFF\xFE' > "$temp" &&
    cat "$file" >> "$temp"; } || { echo "$0: Write failed on $file"; exit 1; }

    { rm "$file" && 
    mv "$temp" "$file"; } || { echo "$0: Replacement failed for $file; exit 1; }
done
export TMPDIR=$td

陷阱可能比我添加的所有单独的错误处理程序更好。

毫无疑问，对于一次性脚本而言，所有这些额外的警告都是过度的，但这些技术可以在推动推动时为您节省时间，尤其是在多文件操作中。

Answer 7

$ printf '\xEF\xBB\xBF' > bom.txt

然后检查：

$ grep -rl $'\xEF\xBB\xBF' .
./bom.txt

如何在linux中重新添加unicode字节顺序标记？

7 个答案: