我有一堆非UTF-8编码的文件,我正在将网站转换为UTF-8编码。
我正在使用简单脚本来保存我想要保存在utf-8中的文件,但文件以旧编码保存:
header('Content-type: text/html; charset=utf-8');
mb_internal_encoding('UTF-8');
$fpath="folder";
$d=dir($fpath);
while (False !== ($a = $d->read()))
{
if ($a != '.' and $a != '..')
{
$npath=$fpath.'/'.$a;
$data=file_get_contents($npath);
file_put_contents('tempfolder/'.$a, $data);
}
}
如何以utf-8编码保存文件?
答案 0 :(得分:67)
添加物料清单:UTF-8
file_put_contents($myFile, "\xEF\xBB\xBF". $content);
答案 1 :(得分:42)
file_get_contents / file_put_contents不会神奇地转换编码。
你必须明确地转换字符串;例如,使用iconv()
或mb_convert_encoding()
。
试试这个:
$data = file_get_contents($npath);
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents('tempfolder/'.$a, $data);
或者,使用PHP的流过滤器:
$fd = fopen($file, 'r');
stream_filter_append($fd, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($fd, fopen($output, 'w'));
答案 2 :(得分:23)
<?php function writeUTF8File($filename,$content) { $f=fopen($filename,"w"); # Now UTF-8 - Add byte order mark fwrite($f, pack("CCC",0xef,0xbb,0xbf)); fwrite($f,$content); fclose($f); } ?>
答案 3 :(得分:5)
Iconv救援。
答案 4 :(得分:3)
在Unix / Linux上,可以使用简单的shell命令来转换给定目录中的所有文件:
recode L1..UTF8 dir/*
也可以通过PHP exec()启动。
答案 5 :(得分:1)
@@ZLIB_1.2.2
我从Cool
获得了这一行答案 6 :(得分:0)
如果要递归使用recode,并过滤类型,请尝试:
find . -name "*.html" -exec recode L1..UTF8 {} \;
答案 7 :(得分:0)
这对我有用。 :)
$f=fopen($filename,"w");
# Now UTF-8 - Add byte order mark
fwrite($f, pack("CCC",0xef,0xbb,0xbf));
fwrite($f,$content);
fclose($f);
答案 8 :(得分:0)
这是一个非常有用的问题。我认为我的Windows 10 PHP7解决方案对仍然存在UTF-8转换麻烦的人很有用。
这是我的步骤。调用以下函数(此处名为 utfsave.php )的PHP脚本本身必须具有UTF-8编码,可以通过在UltraEdit上进行转换来轻松实现。
在utfsave.php中,我们定义了一个调用PHP fopen($ filename,“ wb ”)的函数,即,它以w写模式打开,尤其是在 binary中使用b 模式。
<?php
//
// UTF-8 编码:
//
// fnc001: save string as a file in UTF-8:
// The resulting file is UTF-8 only if $strContent is,
// with French accents, chinese ideograms, etc..
//
function entSaveAsUtf8($strContent, $filename) {
$fp = fopen($filename, "wb");
fwrite($fp, $strContent);
fclose($fp);
return True;
}
//
// 0. write UTF-8 string in fly into UTF-8 file:
//
$strContent = "My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France";
$filename = "utf8text.txt";
entSaveAsUtf8($strContent, $filename);
//
// 2. convert CP936 ANSI/OEM - chinese simplified GBK file into UTF-8 file:
//
$strContent = file_get_contents("cp936gbktext.txt");
$strContent = mb_convert_encoding($strContent, "UTF-8", "CP936");
$filename = "utf8text2.txt";
entSaveAsUtf8($strContent, $filename);
?>
源文件cp936gbktext.txt的文件内容:
>>Get-Content cp936gbktext.txt
My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France 936 (ANSI/OEM - chinois simplifié GBK)
在Windows 10 PHP上运行 utf8save.php ,这样创建的 utf8text.txt , utf8text2.txt 文件将自动保存在UTF- 8种格式。
使用此方法,不需要BOM字符。 BOM解决方案之所以不好,是因为它在例如为MySQL采购sql文件时会引起麻烦。
值得注意的是,我为此无法完成工作 file_put_contents($ filename,utf8_encode($ mystring)); 。
++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++
如果您不知道源文件的编码,则可以使用PHP列出编码:
print_r(mb_list_encodings());
这给出了这样的列表:
Array
(
[0] => pass
[1] => wchar
[2] => byte2be
[3] => byte2le
[4] => byte4be
[5] => byte4le
[6] => BASE64
[7] => UUENCODE
[8] => HTML-ENTITIES
[9] => Quoted-Printable
[10] => 7bit
[11] => 8bit
[12] => UCS-4
[13] => UCS-4BE
[14] => UCS-4LE
[15] => UCS-2
[16] => UCS-2BE
[17] => UCS-2LE
[18] => UTF-32
[19] => UTF-32BE
[20] => UTF-32LE
[21] => UTF-16
[22] => UTF-16BE
[23] => UTF-16LE
[24] => UTF-8
[25] => UTF-7
[26] => UTF7-IMAP
[27] => ASCII
[28] => EUC-JP
[29] => SJIS
[30] => eucJP-win
[31] => EUC-JP-2004
[32] => SJIS-win
[33] => SJIS-Mobile#DOCOMO
[34] => SJIS-Mobile#KDDI
[35] => SJIS-Mobile#SOFTBANK
[36] => SJIS-mac
[37] => SJIS-2004
[38] => UTF-8-Mobile#DOCOMO
[39] => UTF-8-Mobile#KDDI-A
[40] => UTF-8-Mobile#KDDI-B
[41] => UTF-8-Mobile#SOFTBANK
[42] => CP932
[43] => CP51932
[44] => JIS
[45] => ISO-2022-JP
[46] => ISO-2022-JP-MS
[47] => GB18030
[48] => Windows-1252
[49] => Windows-1254
[50] => ISO-8859-1
[51] => ISO-8859-2
[52] => ISO-8859-3
[53] => ISO-8859-4
[54] => ISO-8859-5
[55] => ISO-8859-6
[56] => ISO-8859-7
[57] => ISO-8859-8
[58] => ISO-8859-9
[59] => ISO-8859-10
[60] => ISO-8859-13
[61] => ISO-8859-14
[62] => ISO-8859-15
[63] => ISO-8859-16
[64] => EUC-CN
[65] => CP936
[66] => HZ
[67] => EUC-TW
[68] => BIG-5
[69] => CP950
[70] => EUC-KR
[71] => UHC
[72] => ISO-2022-KR
[73] => Windows-1251
[74] => CP866
[75] => KOI8-R
[76] => KOI8-U
[77] => ArmSCII-8
[78] => CP850
[79] => JIS-ms
[80] => ISO-2022-JP-2004
[81] => ISO-2022-JP-MOBILE#KDDI
[82] => CP50220
[83] => CP50220raw
[84] => CP50221
[85] => CP50222
)
如果您无法猜测,请一一尝试,因为mb_detect_encoding()无法轻松完成这项工作。
答案 9 :(得分:-1)
我把所有内容放在一起,很容易将ANSI文本文件转换为“UTF-8 No Mark”:
function filesToUTF8($searchdir,$convdir,$filetypes) {
$get_files = glob($searchdir.'*{'.$filetypes.'}', GLOB_BRACE);
foreach($get_files as $file) {
$expl_path = explode('/',$file);
$filename = end($expl_path);
$get_file_content = file_get_contents($file);
$new_file_content = iconv(mb_detect_encoding($get_file_content, mb_detect_order(), true), "UTF-8", $get_file_content);
$put_new_file = file_put_contents($convdir.$filename,$new_file_content);
}
}
用法:filesToUTF8('C:/ Temp /','C:/ Temp / conv_files /','php,txt');
答案 10 :(得分:-5)