是否有任何命令知道Windows中文件的编码?
与文件A.txt
编码类似UTF-16
答案 0 :(得分:2)
在Windows命令提示符(cmd
)中,没有我知道的命令,它能够确定文本文件的编码方式。
尽管如此,我编写了一个小批处理文件,可以检查一些条件,从而确定给定的文本文件是ASCII编码还是ANSI编码或Unicode编码(UTF-8或UTF-16,Little Endian)或Big Endian)。首先,它检查第一行(非空)行是否包含零字节,这表示该文件不是ASCII / ANSI编码的。接下来,它检查前几个字节是否构成UTF-8 / UTF-16的Byte Order Mark(BOM)。由于BOM对于Unicode编码文件是可选的,因此对于ASCI / ANSI编码的文件来说,它的缺失不是明确的标志。
所以这里是代码,其中包含大量解释性说明(rem
) - 我希望它有所帮助:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (provide file via the first command line argument)
rem // Store current code page to be able to restore it finally:
for /F "tokens=2 delims=:" %%C in ('chcp') do set /A "$CP=%%C"
rem /* Change to code page 437 (original IBM PC or DOS code page) temporarily;
rem this is necessary for extended characters not to be converted: */
> nul chcp 437
rem // Attempt to read first line from file; this fails if zero-bytes occur:
(
rem // Reset line string variable:
set "LINE="
rem /* The loop does not iterate over an empty file or one with empty lines only;
rem therefore, the behaviour is the same as when zero-bytes occur: */
for /F usebackq^ delims^=^ eol^= %%L in ("%_FILE%") do (
rem // Store first line string:
set "LINE=%%L"
rem // Abort reading file after first non-empty line:
goto :NEXT
)
) || (
rem /* The `for /F` loop returns a non-zero exit code in case the file is empty,
rem contains empty lines only or the first non-empty line contains zero-bytes;
rem to determine whether there are zero-bytes, let `find` process the file,
rem which converts any zero-bytes to spaces, so `for /F` can read the file: */
(
rem // In case the file is empty, the loop does not iterate:
for /F delims^=^ eol^= %%L in ('^< "%_FILE%" find /V ""') do (
rem // Abort reading file after first non-empty line:
goto :ZERO
)
) || (
rem /* The loop did not iterate, so the file is empty or holds empty lines only;
rem // restore the initial code page prior to termination: */
> nul chcp %$CP%
>&2 echo The file is empty, hence encoding cannot be determined!
exit /B
)
)
rem // This point is reached in case the file contains zero-bytes:
:ZERO
rem // Restore the initial code page prior to termination:
> nul chcp %$CP%
>&2 echo NULL-bytes detected in first line, so file is non-ASCII/ANSI!
exit /B
rem // This point is reached in case the file does not contain any zero-bytes:
:NEXT
rem /* Build Byte Order Marks (BOMs) for UTF-16-encoded text (Little Endian and Big Endian)
rem and for UTF-8-encoded text: */
for /F "tokens=1-3" %%A in ('
forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0xFF0xFE 0xFE0xFF 0xEF0xBB0xBF"
') do set "$LE=%%A" & set "$BE=%%B" & set "$U8=%%C"
rem // Check whether the first line of the file begins with any of the BOMs:
if not "%LINE:~,2%"=="%$LE%" if not "%LINE:~,2%"=="%$BE%" if not "%LINE:~,3%"=="%$U8%" goto :CONT
rem /* One of the BOMs has been encountered, hence the file is Unicode-encoded;
rem restore the initial code page prior to termination: */
> nul chcp %$CP%
>&2 echo BOM encountered in first line, so file is non-ASCII/ANSI!
exit /B 1
rem // This point is reached in case the file does not appear as Unicode-encoded:
:CONT
rem // Restore the initial code page prior to termination:
> nul chcp %$CP%
echo The file appears to be an ASCII-/ANSI-encoded text.
endlocal
exit /B