word文档php生成中的编码问题

时间:2015-08-23 11:41:39

标签: php encoding utf-8 ms-word

我在php中成功生成了一个ms-word文档。 我在这个文档中插入了我从富文本编辑器(tinyMCE)获得的html代码。但是我得到了一些额外的意外字符,所以我猜这是一个编码问题。

这是php代码:

$content = $_GET['content'];
$filename = './cases/mydocument.htm';

$output = "<html xmlns:o='urn:schemas-microsoft-com:office:office' xmlns:w='urn:schemas-microsoft-com:office:word' xmlns='http://www.w3.org/TR/REC-html40'>";
$output .= "<head><title>Mon document</title>";
$output .= "<!--[if gte mso 9]>";
$output .= "<xml><w:WordDocument><w:View>Print</w:View><w:Zoom>100</w:Zoom><w:DoNotOptimizeForBrowser/></w:WordDocument></xml>";
$output .= "<![endif]-->";
$output .= "<link rel=File-List href=\"mydocument_files/filelist.xml\">";
$output .= "<style><!-- ";
$output .= "@page";
$output .= "{";
$output .= "    size:21cm 29.7cmt;  /* A4 */";
$output .= "    margin:1cm 1cm 1cm 1cm; /* Margins: 2.5 cm on each side */";
$output .= "    mso-page-orientation: portrait;  ";
$output .= "    mso-header: url(\"mydocument_files/headerfooter.htm\") h1;";
$output .= "    mso-footer: url(\"mydocument_files/headerfooter.htm\") f1;  ";
$output .= "}";
$output .= "@page Section1 { }";
$output .= "div.Section1 { page:Section1; }";
$output .= "p.MsoHeader, p.MsoFooter { border: none; }";
$output .= "--></style>";
$output .= "</head>";
$output .= "<body>";
$output .= "<div class=Section1>";
$output .= $content;
$output .= "</div>";
$output .= "</body>";
$output .= "</html>";

file_put_contents($filename, $output);

class mime10class
{
    private $data;
    const boundary='----=_NextPart_ERTUP.EFETZ.FTYIIBVZR.EYUUREZ';
    function __construct() { $this->data="MIME-Version: 1.0\nContent-Type: multipart/related; boundary=\"".self::boundary."\"\n\n"; }
    public function addFile($filepath,$contenttype,$data)
    {
        $this->data = $this->data.'--'.self::boundary."\nContent-Location: http://www.monsite.com/dev1/cases/".preg_replace('!\\\!', '/', $filepath)."\nContent-Transfer-Encoding: base64\nContent-Type: ".$contenttype."\n\n";
        $this->data = $this->data.base64_encode($data)."\n\n";
    }
    public function getFile() { return $this->data.'--'.self::boundary.'--'; }
}

$doc = New mime10class();
$doc->addFile('mydocument.htm','text/html; charset="utf-8"',$output);

$output_encoded = $doc->getFile();

$filename = './cases/mydocument.doc';
file_put_contents($filename, $output_encoded);

对于前者,我的编辑返回:

<p>test <em>italique</em></p>

我已经检查过,$ content包含了这个h​​tml部分。 但是当我执行第一个file_put_content时,我得到了这个:

测试italique

我试图在file_put_contents之前在$ output上应用utf8_encode,然后我得到它甚至最差:

测试Â,Ãitalique

我的记事本++在没有BOM的utf8中配置

有什么想法吗?

1 个答案:

答案 0 :(得分:0)

每当使用tinymce时,请尝试申请,

 stripslashes($_POST['content']);