结合http-conduit和scalpel-core时的文本编码

时间:2018-08-20 18:16:51

标签: haskell

=iferror(QUERY(IMPORTRANGE("1Ew1j6R-Symxxxxxxxxxxxxxxxxxx","Form Responses 1!A:Q"), "select Col1,Col2 where Col4="&$B2:B,-1),"")

如果上面的代码后跟{-# LANGUAGE OverloadedStrings #-} module Main where import Lib import Network.HTTP.Simple import qualified Data.ByteString.Lazy.Char8 as L8 import Text.HTML.Scalpel.Core import Data.Text.Lazy.Encoding (decodeUtf8) import qualified Data.Text.Lazy.IO as L main :: IO () main = do let address = "http://www.myriobiblos.gr/bible/nt2/matthew/1.asp" response <- httpLBS address putStrLn $ "The status code was: " ++ show (getResponseStatusCode response) print $ getResponseHeader "Content-Type" response let responseBody = getResponseBody response ,则文本将被保存而编码没有问题。但是如果代码后面是

L8.writeFile "ch1.txt" responseBody

然后将所得文本加扰。正如您在导入列表中看到的那样,我尝试使用Data.Text类型,但是我做错了什么。另外,当我尝试在responseBody或内容上使用encodeUtf8时,我收到了以下消息:

  

ch1.txt:commitAndReleaseBuffer:无效的参数(无效的字符)

关于我在做什么错的任何想法?

1 个答案:

答案 0 :(得分:1)

您想在此处解码/编码UTF8是正确的,您只需要进行一些小改动即可:

{-# LANGUAGE OverloadedStrings #-}

module Main where

import Lib
import Network.HTTP.Simple
import qualified Data.ByteString.Lazy.Char8 as L8
import Text.HTML.Scalpel
import Data.Text.Lazy.Encoding (decodeUtf8, encodeUtf8)

main :: IO ()
main = do
    let address = "http://www.myriobiblos.gr/bible/nt2/matthew/1.asp"
    response <- httpLBS address
    putStrLn $ "The status code was: " ++
                show (getResponseStatusCode response)
    print $ getResponseHeader "Content-Type" response
    let responseBody = decodeUtf8 $ getResponseBody response
    let innerText = scrapeStringLike responseBody
                        $ do chroot "tr" $ do text "tr"
    case innerText of
       (Just content) -> L8.writeFile "ch1.txt"  (encodeUtf8 content)
       Nothing -> return ()

对正文进行解码以获取文本内容,对其进行处理,然后再次对其进行编码,以获取要写入磁盘的字节。