通过BOM /其无意义检测编码

时间:2015-10-16 17:15:53

标签: powershell batch-file encoding

我在批处理脚本中使用此代码来替换文件中的文本,然后将文件移动到某个位置。此代码包含在循环中,并在每次传递时读入变量。

powershell -Command "(gc %inputPath%\%inputFile%) -replace 'Foo', '%bar%' | Out-File '%outputPath%\%outputFile%' -encoding default"

我遇到了一个问题,因为我缺少" -encoding默认"论点。添加该参数后,我对ANSI文件没有问题,但有些是UTF-8,我遇到了同样的问题。

这些文件是可执行文件的配置,他们对配置的编码非常挑剔。

我已经搜索了一个方法来读取输入编码的类型,我一直无法找到有效的批处理解决方案。批处理是否有读取编码的方法?

我接受powershell解决方案,但只有它们可以在批处理文件中执行。我不想使用外部模块,但如果它是唯一的方法则可能不得不这样做。

2 个答案:

答案 0 :(得分:3)

创建一个名为dummy.txt的普通ascii文本文件,并在其中放入两个字符。我通常只是把AA。然后对两个文件进行二进制比较。

fc /b LIttleEndian.txt dummy.txt

然后,您会将此视为输出

Comparing files LIttleEndian.txt and DUMMY.TXT
00000000: FF 41
00000001: FE 41
FC: LIttleEndian.txt longer than DUMMY.TXT

对于UTF8,你会看到这一点。

C:\BatchFiles\Encoding>fc /b utf8.txt dummy.txt
Comparing files UTF8.txt and DUMMY.TXT
00000000: EF 41
00000001: BB 41
FC: UTF8.txt longer than DUMMY.TXT

使用FOR / F命令解析输出,这可以帮助您确定用于输入文件的编码。

对于ascii文本,十六进制代码将以数字开头。

C:\BatchFiles\Encoding>fc /b Normaltext.txt dummy.txt
Comparing files Normaltext.txt and DUMMY.TXT
00000000: 4E 41
00000001: 6F 41
FC: Normaltext.txt longer than DUMMY.TXT

答案 1 :(得分:2)

这是继续使用certutil命令的另一种方法:

@echo off
:detect_encoding
setLocal
if "%1" EQU "-?" (
    endlocal
    call :help
    exit /b 0
)
if "%1" EQU "-h" (
    endlocal
    call :help
    exit /b 0
)
if "%1" EQU "" (
    endlocal
    call :help
    exit /b 0
)


if not exist "%1" (
        echo file does not exists
    endlocal
    exit /b 54
)

if exist "%1\" (
        echo this cannot be used against directories
    endlocal
    exit /b 53
)

if "%~z1" EQU "0" (
    echo empty files are not accepted
    endlocal
    exit /b 52
)



set "file=%~snx1"
del /Q /F "%file%.hex" >nul 2>&1

certutil -f -encodehex %file% %file%.hex>nul

rem -- find the first line of hex file --

for /f "usebackq delims=" %%E in ("%file%.hex") do (
    set "f_line=%%E" > nul
        goto :enfdor
)
:enfdor
del /Q /F "%file%.hex" >nul 2>&1

rem -- check the BOMs --
echo %f_line% | find "ef bb bf"     >nul && echo utf-8     &&endlocal && exit /b 1
echo %f_line% | find "ff fe 00 00"  >nul && echo utf-32 LE &&endlocal && exit /b 5
echo %f_line% | find "ff fe"        >nul && echo utf-16    &&endlocal && exit /b 2
echo %f_line% | find "fe ff 00"     >nul && echo utf-16 BE &&endlocal && exit /b 3
echo %f_line% | find "00 00 fe ff"  >nul && echo utf-32 BE &&endlocal && exit /b 4

echo ASCII & endlocal & exit /b 6



endLocal
goto :eof

:help
echo.
echo  %~n0 file - Detects encoding of a text file
echo.
echo for each encoding you will recive a text responce with a name and a errorlevel codes as follows:

echo     1 - UTF-8
echo     2 - UTF-16 BE
echo     3 - UTF-16 LE
echo     4 - UTF-32 BE
echo     5 - UTF-32 LE
echo     6 - ASCII

echo for empty files you will receive error code 52
echo for directories  you will receive error code 53
echo for not existing file  you will receive error code 54
goto :eof