Question

我们有一个网络应用程序，可以导出包含UTF-8外来字符的CSV文件，没有BOM。 Windows和Mac用户都在Excel中获得垃圾字符。我尝试用BOM转换为UTF-8; Excel / Win很好用，Excel / Mac显示乱码。我正在使用Excel 2003 / Win，Excel 2011 / Mac。这是我尝试过的所有编码：

Encoding  BOM      Win                            Mac
--------  ---      ----------------------------   ------------
utf-8     --       scrambled                      scrambled
utf-8     BOM      WORKS                          scrambled
utf-16    --       file not recognized            file not recognized
utf-16    BOM      file not recognized            Chinese gibberish
utf-16LE  --       file not recognized            file not recognized
utf-16LE  BOM      characters OK,                 same as Win
                   row data all in first field

最好的是具有BOM的UTF-16LE，但CSV不被识别。字段分隔符是逗号，但分号不会改变。

是否有适用于这两个领域的编码？

Answer 1

Excel编码

我发现WINDOWS-1252编码在处理Excel时最不令人沮丧。由于它基本上是微软拥有自己的专有字符集，因此可以假设它可以在Mac和Windows版本的MS-Excel上运行。两个版本至少包括一个相应的“文件源”或“文件编码”选择器，它可以正确读取数据。

根据您的系统和您使用的工具，此编码也可以命名为CP1252，ANSI，Windows (ANSI)，MS-ANSI或Windows，其他变种。

此编码是ISO-8859-1（又名LATIN1和其他）的超集，因此如果由于某种原因无法使用ISO-8859-1，则可以回退到WINDOWS-1252。请注意，ISO-8859-1缺少WINDOWS-1252中的某些字符，如下所示：

| Char | ANSI | Unicode | ANSI Hex | Unicode Hex | HTML entity | Unicode Name                               | Unicode Range            |
| €    | 128  | 8364    | 0x80     | U+20AC      | &euro;      | euro sign                                  | Currency Symbols         |
| ‚    | 130  | 8218    | 0x82     | U+201A      | &sbquo;     | single low-9 quotation mark                | General Punctuation      |
| ƒ    | 131  | 402     | 0x83     | U+0192      | &fnof;      | Latin small letter f with hook             | Latin Extended-B         |
| „    | 132  | 8222    | 0x84     | U+201E      | &bdquo;     | double low-9 quotation mark                | General Punctuation      |
| …    | 133  | 8230    | 0x85     | U+2026      | &hellip;    | horizontal ellipsis                        | General Punctuation      |
| †    | 134  | 8224    | 0x86     | U+2020      | &dagger;    | dagger                                     | General Punctuation      |
| ‡    | 135  | 8225    | 0x87     | U+2021      | &Dagger;    | double dagger                              | General Punctuation      |
| ˆ    | 136  | 710     | 0x88     | U+02C6      | &circ;      | modifier letter circumflex accent          | Spacing Modifier Letters |
| ‰    | 137  | 8240    | 0x89     | U+2030      | &permil;    | per mille sign                             | General Punctuation      |
| Š    | 138  | 352     | 0x8A     | U+0160      | &Scaron;    | Latin capital letter S with caron          | Latin Extended-A         |
| ‹    | 139  | 8249    | 0x8B     | U+2039      | &lsaquo;    | single left-pointing angle quotation mark  | General Punctuation      |
| Œ    | 140  | 338     | 0x8C     | U+0152      | &OElig;     | Latin capital ligature OE                  | Latin Extended-A         |
| Ž    | 142  | 381     | 0x8E     | U+017D      |             | Latin capital letter Z with caron          | Latin Extended-A         |
| ‘    | 145  | 8216    | 0x91     | U+2018      | &lsquo;     | left single quotation mark                 | General Punctuation      |
| ’    | 146  | 8217    | 0x92     | U+2019      | &rsquo;     | right single quotation mark                | General Punctuation      |
| “    | 147  | 8220    | 0x93     | U+201C      | &ldquo;     | left double quotation mark                 | General Punctuation      |
| ”    | 148  | 8221    | 0x94     | U+201D      | &rdquo;     | right double quotation mark                | General Punctuation      |
| •    | 149  | 8226    | 0x95     | U+2022      | &bull;      | bullet                                     | General Punctuation      |
| –    | 150  | 8211    | 0x96     | U+2013      | &ndash;     | en dash                                    | General Punctuation      |
| —    | 151  | 8212    | 0x97     | U+2014      | &mdash;     | em dash                                    | General Punctuation      |
| ˜    | 152  | 732     | 0x98     | U+02DC      | &tilde;     | small tilde                                | Spacing Modifier Letters |
| ™    | 153  | 8482    | 0x99     | U+2122      | &trade;     | trade mark sign                            | Letterlike Symbols       |
| š    | 154  | 353     | 0x9A     | U+0161      | &scaron;    | Latin small letter s with caron            | Latin Extended-A         |
| ›    | 155  | 8250    | 0x9B     | U+203A      | &rsaquo;    | single right-pointing angle quotation mark | General Punctuation      |
| œ    | 156  | 339     | 0x9C     | U+0153      | &oelig;     | Latin small ligature oe                    | Latin Extended-A         |
| ž    | 158  | 382     | 0x9E     | U+017E      |             | Latin small letter z with caron            | Latin Extended-A         |
| Ÿ    | 159  | 376     | 0x9F     | U+0178      | &Yuml;      | Latin capital letter Y with diaeresis      | Latin Extended-A         |

请注意欧元符号缺失。此表格位于Alan Wood。

转换

每种工具和语言的转换都有所不同。但是，假设您有一个文件query_result.csv，您知道它是UTF-8编码的。使用WINDOWS-1252将其转换为iconv：

iconv -f UTF-8 -t WINDOWS-1252 query_result.csv > query_result-win.csv

Answer 2

对于带有BOM的UTF-16LE，如果您使用制表符作为分隔符而不是逗号，Excel将识别这些字段。它起作用的原因是Excel实际上最终使用了它的Unicode * .txt解析器。

警告：如果在Excel中编辑并保存文件，它将保存为制表符分隔的ASCII。现在的问题是，当你重新打开文件时，Excel假定它是真正的CSV（带逗号），看到它不是Unicode，所以将它解析为逗号分隔 - 因此会产生它的哈希值！

更新：至少今天在Excel 2010（Windows）中似乎没有发生上述警告，尽管在以下情况下保存行为似乎存在差异：

您编辑并退出Excel（尝试另存为'Unicode * .txt'）

与之相比：

编辑和关闭只是文件（按预期工作）。

Answer 3

低调是：没有解决方案。无论您使用什么编码或箍跳，Excel 2011 / Mac都无法正确解释包含变音符号和变音符号的CSV文件。我很高兴听到有人告诉我不同的事情！

Answer 4

您只尝试过以逗号分隔和以分号分隔的CSV。如果您尝试使用制表符分隔的CSV（也称为TSV），您将找到答案：

UTF-16LE BOM （字节顺序标记），制表符分隔

但是：在评论中你提到TSV不是你的选择（我在你的问题中找不到这个要求）。这真遗憾。这通常意味着您允许手动编辑TSV文件，这可能不是一个好主意。视觉检查TSV文件不是问题。此外，编辑器可以设置为显示特殊字符以标记选项卡。

是的，我在Windows和Mac上试过这个。

Answer 5

以下是将utf8编码的CSV导入Excel 2011 for Mac的关键：微软称：“Excel for Mac目前不支持UTF-8。” Excel for Mac 2011 and UTF-8

是的，方式去MS！

Answer 6

在Mac上使用UTF-8读取CSV文件的最佳解决方法是将它们转换为XLSX格式。我找到了一个由Konrad Foerstner制作的脚本，我通过添加对不同分隔符的支持来进行一些改进。

从Github https://github.com/brablc/clit/blob/master/csv2xlsx.py下载脚本。要运行它，您需要为Excel文件操作安装python模块 openpyxl ：sudo easy_install openpyxl。

Answer 7

在我的情况下，似乎Excel 2011 for Mac OS没有使用Encoding.GetEncoding（“10000”），因为我认为并浪费了2天但与Microsoft OS上的iso相同。最好的证据就是在Excel 2011中为特殊字符创建一个MAC文件，将其保存为CSV，然后在MAC文本编辑器中打开它，并且字符被打乱。

对我来说，这种方法很有效 - 这意味着MAC OS上的Excel 2011上的csv导出内部有特殊的西欧字符：

Encoding isoMacOS = Encoding.GetEncoding("iso-8859-1");
Encoding defaultEncoding = Encoding.Default; 

// Convert the string into a byte array.
byte[] defaultEncodingBytes = defaultEncoding.GetBytes(exportText);

// Perform the conversion from one encoding to the other.
byte[] ansiBytes = Encoding.Convert(defaultEncoding, isoMacOS, defaultEncodingBytes);

decodedString = isoMacOS.GetString(ansiBytes);

Answer 8

没有BOM的UTF-8目前适用于Excel Mac 2011 14.3.2。

UTF-8 + BOM类型有效，但BOM呈现为乱码。

如果导入文件并完成向导，则UTF-16有效，但如果只是双击它则不行。

Answer 9

在我的Mac OS上，Text Wrangler将使用Excel创建的CSV文件识别为具有＆＃34; Western＆＃34;编码

经过一些谷歌搜索后，我制作了这个小脚本（我不确定Windows的可用性，可能是Cygwin？）：

$ cat /usr/local/bin/utf8.sh

#!/bin/bash

INPUTFILE="$1"

iconv -f macroman -c -t UTF-8 $INPUTFILE |tr '\r' '\n' >/tmp/file.$$.csv

mv $INPUTFILE ms_trash
mv /tmp/file.$$.csv $INPUTFILE

Answer 10

在我的情况下，这是有效的（Mac，Excel 2011，西里尔字母和拉丁字符与捷克变音符号）：

Charset UTF-16LE（简称UTF-16还不够）
BOM“\ xFF \ xFE”
\ t（制表符）作为分隔符
不要忘记编码分隔符和CRLF： - ）
使用iconv代替mb_convert_encoding

Answer 11

以下适用于Excel for Mac 2011和Windows Excel 2002：

在Mac上使用iconv，将文件转换为UTF-16 Little-Endian +将其命名为* .txt（.txt扩展名强制Excel运行文本导入向导）：

iconv -f UTF-8 -t UTF-16LE filename.csv >filename_UTF-16LE.csv.txt
在Excel中打开文件，在文本导入向导中选择：
- 第1步：文件来源：忽略它，你选择什么并不重要
- 第2步：为分隔符和文本限定符选择正确的值
- 第3步：如有必要，请选择列格式

PS iconv创建的UTF-16LE在开头就有BOM字节FF FE。

PPS我的原始csv文件是在Windows 7计算机上以UTF-8格式创建的（开头是BOM字节EF BB BF）并使用了CRLF换行符。逗号用作字段分隔符，单引号用作文本限定符。它包含ASCII字母加上不同的拉丁字母与波形符号，变音符号等，加上一些西里尔字母。所有在Excel for Win和Mac中都正确显示。

PPPS精确软件版本：
* Mac OS X 10.6.8
* Excel for Mac 2011 v.14.1.3
* Windows Server 2003 SP2
* Windows Excel 2002 v.10.2701.2625

Answer 12

而不是csv，尝试输出带有XLS扩展名的html和“application / excel”mime-type。我知道这可以在Windows中使用，但不能代表MacOS

Answer 13

这对我有用

在BBEdit或TextWrangler *中打开文件。
将文件设置为Unicode（UTF-16 Little-Endian）（行结尾可以是Unix或Windows）。保存！
在Excel中：数据＆gt;获取外部数据＆gt;导入文本文件...

现在关键点，选择 MacIntosh 作为文件来源（它应该是第一选择）。

这是使用Excel 2011（版本14.4.2）

*窗口底部有一点下拉

Answer 14

使用java（带BOM的UTF-16LE）解决这个问题：

public static int printSum(int n, boolean reverse){

  int sum = 0;
  for( int i = n ; i>0 ; i--){
     sum += i;
     if(i > 1){
       System.out.print(i + "+");
     }else{
      System.out.print(i);
     }    
  }
  System.out.print( n + "=" + sum);
  return sum;
 }

请注意，CSV文件应使用String csvReportStr = getCsvReport(); byte[] data = Charset.forName("UTF-16LE").encode(csvReportStr) .put(0, (byte) 0xFF) .put(1, (byte) 0xFE) .array();作为分隔符。您可以在Windows和MAC OS X上读取CSV文件。

请参阅：here

Answer 15

在我的案例中，将Preamble添加到文件解决了我的问题：

var data = Encoding.UTF8.GetBytes(csv);
var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
return File(new MemoryStream(result), "application/octet-stream", "data.csv");

哪个编码在Mac和Windows上使用Excel正确打开CSV文件？

15 个答案:

Excel编码

转换