Question

我在Windows 7 64位上。

我的程序需要从外部源检索一些文本（Utf8编码），用它做一些事情，然后将其保存到磁盘。原文使用＆＃34; \ r \ n＆＃34;序列来表示换行符（我很高兴保持这种方式）。

问题：使用Data.Text.writeFile时每个＆＃34; \ r \ n＆＃34;序列似乎被翻译为＆＃34; \ r \ n \ r \ n＆＃34;，即每一个＆＃39; \ n＆＃39;被翻译为＆＃34; \ r \ n＆＃34;，即使它已经在＆＃39; \ r＆＃39;之前在原文中。据我所知，在Windows操作系统上写入文件时，＆＃39; \ n＆＃39;应该被转换为＆＃34; \ r \ n＆＃34;，如果还没有先于＆＃39; \ r＆＃39; ，但翻译了＆＃34; \ r \ n＆＃34;到＆＃34; \ r \ n \ n \ n＆＃34;似乎不对。

使用应用于文本的encodeUtf8版本的ByteString.writeLine工作得很好（没有额外的＆＃34; \ r＆＃34;插入＆＃34; \ r \ n＆＃34;序列）

一个简单的例子：

{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString as B
import qualified Data.Text as T
import qualified Data.Text.IO as T (writeFile)
import qualified Data.Text.Encoding as T (encodeUtf8)

str = "Line 1 is here\r\nLine 2 is here\r\nLine 3 is here" :: T.Text

main = do
    B.writeFile "byt.bin" $ T.encodeUtf8 str
    T.writeFile "txt.bin" str

使用十六进制编辑器查看此代码生成的每个文件，可以看到通过T.writeFile行生成的文件中每个x0A前面添加的额外x0D。

B.writeFile： enter image description here

T.writeFile： enter image description here

我的问题：我做错了什么？有没有办法在Windows上使用T.writeFile，而不是得到＆＃34; \ r \ n＆＃34;翻译为＆＃34; \ r \ n \ n \ n＆＃34;？

Answer 1

您的答案位于the docs：

从GHC 6.12开始，使用系统或句柄的当前区域设置和行结束约定来执行文本I / O.

看到你自己没有打开手柄，库很可能会以文本模式打开文件，导致操作系统翻译终结字符。您可以做的是使用openBinaryFile以二进制模式打开文件，然后使用Data.Text.hPutStr来阻止此操作。

但是，处理编码的操作系统可能也不是您想要的。根据您的情况，像使用ByteString一样明确地编码/解码字符串可能是更好的主意。

\ r \ n翻译为Haskell中的\ r \ n \ n \ n

1 个答案: