Question

我正在尝试将字符串分开，如下所示：

let path = "/Users/user/Downloads/history.csv"

    do {
        let contents = try NSString(contentsOfFile: path, encoding: String.Encoding.utf8.rawValue )
        let rows = contents.components(separatedBy: "\n")

        print("contents: \(contents)")
        print("rows: \(rows)")  

    }
    catch {
    }

我有两个文件，看起来几乎相同。从第一个文件输出如下：

输出文件1：

contents: 2017-07-31 16:29:53,0.10109999,9.74414271,0.98513273,0.15%,42302999779,-0.98513273,9.72952650
2017-07-31 16:29:53,0.10109999,0.25585729,0.02586716,0.25%,42302999779,-0.02586716,0.25521765


rows: ["2017-07-31 16:29:53,0.10109999,9.74414271,0.98513273,0.15%,42302999779,-0.98513273,9.72952650", "2017-07-31 16:29:53,0.10109999,0.25585729,0.02586716,0.25%,42302999779,-0.02586716,0.25521765", "", ""]

输出文件2：

contents: 40.75013313,0.00064825,5/18/2017 7:17:01 PM

19.04004820,0.00059900,5/19/2017 9:17:03 PM

rows: ["4\00\0.\07\05\00\01\03\03\01\03\0,\00\0.\00\00\00\06\04\08\02\05\0,\05\0/\01\08\0/\02\00\01\07\0 \07\0:\01\07\0:\00\01\0 \0P\0M\0", "\0", "1\09\0.\00\04\00\00\04\08\02\00\0,\00\0.\00\00\00\05\09\09\00\00\0,\0\05\0/\01\09\0/\02\00\01\07\0 \09\0:\01\07\0:\00\03\0 \0P\0M\0", "\0", "\0", "\0"]

所以这两个文件都可以读作String，因为print(content)正在运行。但是一旦字符串分离，第二个文件就不再可读了。我尝试了不同的编码，但没有任何效果。有谁有想法，如何强制字符串到第二个文件，以保持可读字符串？

Answer 1

您的文件显然是UTF-16（little-endian）编码：

$ hexdump fullorders4.csv 
0000000 4f 00 72 00 64 00 65 00 72 00 55 00 75 00 69 00
0000010 64 00 2c 00 45 00 78 00 63 00 68 00 61 00 6e 00
0000020 67 00 65 00 2c 00 54 00 79 00 70 00 65 00 2c 00
0000030 51 00 75 00 61 00 6e 00 74 00 69 00 74 00 79 00
...

对于ASCII字符，UTF-16编码的第一个字节是 ASCII码，第二个字节为零。

如果文件读取为UTF-8，则将零转换为 ASCII NUL字符，即您在输出中看到的\0。

因此，将编码指定为utf16LittleEndian有效在你的情况下：

let contents = try NSString(contentsOfFile: path, encoding: String.Encoding.utf16LittleEndian.rawValue)
// or:
let contents = try String(contentsOfFile: path, encoding: .utf16LittleEndian)

还有一种尝试检测使用的编码的方法（比较iOS: What's the best way to detect a file's encoding）。在Swift中，这将是

var enc: UInt = 0
let contents = try NSString(contentsOfFile: path, usedEncoding: &enc)
// or:
var enc = String.Encoding.ascii
let contents = try String(contentsOfFile: path, usedEncoding: &enc)

但是，在您的特定情况下，这将读取文件为UTF-8 再次因为有效的UTF-8。预先byte order mark (BOM) 到文件（UTF-16 little-endian的FF FE）可以解决这个问题问题可靠。

String只返回sortedBy之后的数字

1 个答案: