Question

除非文件具有utf-8字符，否则我有以下代码可以正常工作：

module Main where
import Ref
main = do
    text <- getLine
    theInput <- readFile text
    writeFile ("a"++text) (unlist . proc . lines $ theInput)

使用utf-8 characteres，我得到了这个： hGetContents: invalid argument (invalid byte sequence)

由于我使用的文件包含UTF-8个字符，因此我想处理此异常，以便尽可能重用从Ref导入的函数。

有没有办法将UTF-8文件读作IO String，以便我可以重复使用Ref的功能？我应该对我的代码做什么修改？在此先感谢。

我附加了Ref模块中的函数声明：

unlist :: [String] -> String
proc :: [String] -> [String]

来自前奏：

lines :: String -> [String]

Answer 1

This can be done with just GHC's basic (but extended from the standard) System.IO module, although you'll then have to use more functions:

module Main where

import Ref
import System.IO

main = do
    text <- getLine
    inputHandle <- openFile text ReadMode 
    hSetEncoding inputHandle utf8
    theInput <- hGetContents inputHandle
    outputHandle <- openFile ("a"++text) WriteMode
    hSetEncoding outputHandle utf8
    hPutStr outputHandle (unlist . proc . lines $ theInput)
    hClose outputHandle -- I guess this one is optional in this case.

Answer 2

Use System.IO.Encoding.

The lack of unicode support is a well known problem with with the standard Haskell IO library.

module Main where

import Prelude hiding (readFile, getLine, writeFile)
import System.IO.Encoding
import Data.Encoding.UTF8

main = do
    let ?enc = UTF8
    text <- getLine
    theInput <- readFile text
    writeFile ("a" ++ text) (unlist . proc . lines $ theInput)

Answer 3

感谢您的回答，但我自己找到了解决方案。实际上我正在使用的文件有这个编纂：

ISO-8859 text, with CR line terminators

所以使用我的haskell代码处理该文件它应该有这个编码：

UTF-8 Unicode text, with CR line terminators

您可以使用实用程序file检查文件编码，如下所示：

$ file filename

要更改文件编码，请按照此link！

中的说明进行操作

在Haskell中使用UTF-8作为IO String读取文件

3 个答案: