简单的Haskell代码中的字符编码错误

时间:2015-05-01 14:45:36

标签: haskell utf-8 character-encoding

我在haskell中遇到字符编码问题。这个简单的程序写错了结果。我真正感兴趣的是编码函数,它迫使我使用ByteString。申请是:

import Data.ByteString.Char8 (unpack, pack)
import Data.ByteString.Lazy (toStrict)
import Data.Csv (encode) -- cabal install cassava

main = do
    -- (middle character is polish diacritic letter)
    putStrLn $ unpack $ pack "aća"
    putStrLn $ unpack $ toStrict $ encode ["aća"]

应打印

aća
a,ć,a

但是写了

aa
a,Ä,a

这会破坏我的应用程序编码CSV。无论我的语言环境设置如何,都会在Linux上发生这种情况

$ locale
LANG=pl_PL.UTF-8
LC_CTYPE="pl_PL.UTF-8"
LC_NUMERIC="pl_PL.UTF-8"
LC_TIME="pl_PL.UTF-8"
LC_COLLATE="pl_PL.UTF-8"
LC_MONETARY="pl_PL.UTF-8"
LC_MESSAGES="pl_PL.UTF-8"
LC_PAPER="pl_PL.UTF-8"
LC_NAME="pl_PL.UTF-8"
LC_ADDRESS="pl_PL.UTF-8"
LC_TELEPHONE="pl_PL.UTF-8"
LC_MEASUREMENT="pl_PL.UTF-8"
LC_IDENTIFICATION="pl_PL.UTF-8"
LC_ALL=pl_PL.UTF-8

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

我想知道的是如何将encode(Data.ByteString.Lazy.ByteString)的输出转换为String,以便我可以使用例如将其写入文件。 writeFile函数。

0 个答案:

没有答案