从mysql数据库生成的可下载csv文件中无法读取的Unicode字符

时间:2017-04-25 01:49:52

标签: php unicode export-to-csv

我正在尝试使用php脚本生成从mysql数据库生成的csv可下载文件。它正在工作,但其中的unicode字符是不可读的。当我在Notepad ++中打开时,unicode字符是可读的。我读了这个问题的答案,但他们没有帮助。请帮忙。以下是我的代码 -

<?php
mb_internal_encoding("UTF-8");
mb_http_output( "UTF-8" );    
ob_start("mb_output_handler");

include("t/db_config.php");
$con=mysqli_connect($db_host,$db_user,$db_password,$db_name);

// Check connection
if (mysqli_connect_errno())
  {
  echo "Failed to connect to MySQL: " . mysqli_connect_error();
  }
  //set_charset when connecting with database
mysqli_set_charset( $con, 'utf8');

$data=array();

 $sql="SELECT s.t_id,s.t_text,p.user_name,p.description,s.time,p.place from 
 t 
 AS s INNER JOIN users AS p ON s.user_name=p.user_name order by s.time 
 desc";
 $result = mysqli_query($con,$sql);

  while($row = mysqli_fetch_array($result)) {
    $array=array("Link" => $row[0],"Text"=>$row[1] , "User Name" => $row[2] 
  , "User Profile" => $row[3], "Time" => $row[4] , "Place" => $row[5]);
     array_push($data,$array);
  }
  function cleanData(&$str)
  {
    if($str == 't') $str = 'TRUE';
    if($str == 'f') $str = 'FALSE';
    if(preg_match("/^0/", $str) || preg_match("/^\+?\d{8,}$/", $str) || 
  preg_match("/^\d{4}.\d{1,2}.\d{1,2}/", $str)) {
      $str = "$str";
    }
    if(strstr($str, '"')) $str = '"' . str_replace('"', '""', $str) . '"';
  }

  // filename for download
  $filename = "website_data_" . date('Ymd') . ".csv";

  header("Content-Disposition: attachment; filename=\"$filename\"");
  header("Content-Type: text/csv");

  $out = fopen("php://output", 'w');

  $flag = false;
  foreach($data as $row) {
    if(!$flag) {
      // display field/column names as first row
      fputcsv($out, array_keys($row), ',', '"');
      $flag = true;
    }
    array_walk($row, 'cleanData');
    fputcsv($out, array_values($row), ',', '"');
  }

  fclose($out);
  exit;
?>

这是一个示例输入,也是所需的输出 -

umeshduttनिर्दोषककसजामिलरहीहै

但是在excel中打开csv文件后得到以下输出 -

umeshduttकमालकाकानà¥,नहà¥à¤œà¤¿à¤¸à¤ ®à¥‡à¤¨à¤¿à¤°à¥à¤|à¥<षकà¥<सजामिलर हà¥àà¹à¥à¥¤

修改

带示例数据的Mysql表结构

表t

1)t_id(主要)..... | 2)..... t_text ...... | 3)....时间........ | 4).. USER_NAME ........
=================== | ============================== ========================= 1)BIGINT(20)....... | 2)varchar(255)| 3)datetime | 4)炭(20)
................... | 2)utf8_general_ci | | 4)utf8_general_ci =================== | ================= | ============ ========================= 847589475442204000 | संविधान'सुप्रीम'है| 3/31/2017 5:01:52 AM | kotians

表用户

1)user_id(主要)| 2)user_name | 3)放置| 4)描述 ================== | ================ | ============== = | ============
1)bigint(20)| 2)char(20)| 3)varchar(30)| 4)varchar(200)                   | 2)utf8_general_ci | 3)utf8_general_ci | 4)utf8_general_ci ================== | ================ | ============== = | =============

2883542694 | kotians |阿德莱德|工程师

1 个答案:

答案 0 :(得分:1)

以下是成功运行的完整PHP代码(诚然,我没有花时间系统地删除encodingheader函数以查看它是否仍然可以使用更少的代码):< / p>

if(!$con=mysqli_connect("host","user","pass","db")){
    echo "Failed to connect to MySQL: ",mysqli_connect_error();
}else{
    mysqli_set_charset($con,'utf8');
    $sql="SELECT
              CONCAT('=\"',t.t_id,'\"'),
              t.t_text,
              p.user_name,
              p.description,
              CONCAT('=\"',t.time,'\"'),
              p.place
          FROM `t`
          INNER JOIN `users` p ON t.user_name=p.user_name
          ORDER BY t.time DESC;";   
    if($result=mysqli_query($con,$sql)){
        header("Content-Disposition: attachment; filename=\"website_data_".date('Ymd').".csv\"");
        header("Content-Type: text/csv");
        header('Pragma: no-cache');    
        header('Expires: 0');
        $out=fopen('php://output','w');
        fputs($out,"\xEF\xBB\xBF");  // Byte Order Mark
        fputcsv($out,["Link","Text","User Name","User Profile","Time","Place"],',','"');
        while($row=mysqli_fetch_row($result)){
            fputcsv($out,$row,',','"');
        }
        fclose($out);
    }else{
        echo mysqli_error($con);    
    }
}

默认情况下,Excel t.t_id的大整数值将使用科学记数法显示(8.47589E+17),而t.time的格式将变为:n/j/Y g:i:s

要屏蔽这些默认调整,我已将值包装在双引号(")中,并在每个值前加=

我建议在sql中进行任何db值清理/修改,因为你可以准备特定的列来解决已知问题,而不是迭代行中的所有值。

“字节顺序标记”是原始代码的重要补充。

看来这至少可以用这三种方式写出来,效果相同:

fputs($out,chr(0xEF).chr(0xBB).chr(0xBF));
fputs($out,chr(239).chr(187).chr(191));
fputs($out,"\xEF\xBB\xBF");  // I chose the shortest one

参考文献和附加阅读:

我已经提供了一些建议/改进,例如:

  • 在继续生成csv之前检查非假$result
  • 添加了一些额外的header()语句以确保新鲜度。
  • fputcsv()在循环之前静态修改了键数组。
  • 简化了while()循环内部的过程。

我使用了这些表中的数据:

CREATE TABLE `t` (
  `t_id` bigint(20) NOT NULL,
  `t_text` varchar(255) CHARACTER SET utf8 NOT NULL,
  `time` datetime NOT NULL,
  `user_name` char(20) CHARACTER SET utf8 NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `t` (`t_id`, `t_text`, `time`, `user_name`) VALUES
(847589475442204000, 'संविधान \'सुप्रीम\' है', '2017-03-01 05:01:52', 'kotians');

ALTER TABLE `t` ADD PRIMARY KEY (`t_id`);

CREATE TABLE `users` (
  `user_id` bigint(20) NOT NULL,
  `user_name` char(20) CHARACTER SET utf8 NOT NULL,
  `place` varchar(30) CHARACTER SET utf8 NOT NULL,
  `description` varchar(200) CHARACTER SET utf8 NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `users` (`user_id`, `user_name`, `place`, `description`) VALUES
(2883542694, 'kotians', 'Ade\'laide', 'Engi\"neer');

ALTER TABLE `users` ADD PRIMARY KEY (`user_id`);

这是生成的CSV文件中活动单元格的屏幕截图:

enter image description here