我正在学习haskell,并尝试制作一个简单的程序,它给出了一个元组列表
(header, data)
阅读CSV时。我试图使用Data.Text.Lazy
和Data.Text.Lazio.IO
,因为我理解
与String
相比,它们具有良好的性能和unicode覆盖率。
我正在处理的功能将采用行号(n
)和CSV文件名(filename
)并仅返回(header, datum)
元组
这是我的CSV," dat.csv"
ORDINAL,CATEGORICAL,BOOL,CONTINUOUS,INT
Low,Blue,True,1.2,2
Medium,Green,False,0.5,3
High, Green,False,1.0,5
这是我的代码:
-- hs_reader.hs
{-# LANGUAGE OverloadedStrings #-}
import Data.Text.Lazy as T
import Data.Text.Lazy.IO as I
import Control.Applicative
getL :: Int -> FilePath -> IO [(Text,Text)]
getL n filename =
do
flines <- T.lines <$> I.readFile filename
let headers = Prelude.head flines
let body = Prelude.tail flines
let row = Prelude.zip (splitOn "," headers) (splitOn "," (body !! n))
return row
这就像我想要的那样:
Prelude> :l hs_reader
[1 of 1] Compiling Main ( hs_reader.hs, interpreted )
Ok, modules loaded: Main.
Prelude> getL 1 "dat.csv"
[("ORDINAL","Medium"),("CATEGORICAL","Green"),("BOOL","False"),("CONTINUOUS","0.5"),("INT","3")]
Prelude> getL 2 "dat.csv"
[("ORDINAL","High"),("CATEGORICAL"," Green"),("BOOL","False"),("CONTINUOUS","1.0"),("INT","5")]
我意识到我对如何正确使用monad了解不多。我有4个主要问题:
问题(1)我想对一系列行号进行部分功能应用。为什么这不起作用?
let readF x = getL x "dat.csv"
-- a
Prelude.map readF [1..3]
--
Prelude> Prelude.map readF [1..3]
--
<interactive>:514:1:
No instance for (Show (IO [(Text, Text)]))
arising from a use of ‘print’
In a stmt of an interactive GHCi command: print it
--
-- b.
Prelude> T.map readF [1..3]
--
<interactive>:515:7:
Couldn't match type ‘IO [(Text, Text)]’ with ‘Char’
Expected type: Char -> Char
Actual type: Int -> IO [(Text, Text)]
In the first argument of ‘T.map’, namely ‘readF’
In the expression: T.map readF [1 .. 3]
--
<interactive>:515:13:
Couldn't match expected type ‘Text’ with actual type ‘[Integer]’
In the second argument of ‘T.map’, namely ‘[1 .. 3]’
In the expression: T.map readF [1 .. 3]
In an equation for ‘it’: it = T.map readF [1 .. 3]
问题(2)有更优雅的方法吗?我可以在没有任何let语句的情况下这样做,因为我有一个吗?
问题(3)我试图使用以下内容,因为它看起来更像我在网上看到的例子。为什么这不起作用? (我不能使用&#34;&lt; - &#34;在哪里?)
getL2 :: Int -> FilePath -> [(Text,Text)]
getL2 n filename = do
Prelude.zip (splitOn "," headers) (splitOn "," (body !! n))
where
headers = Prelude.head flines
body = Prelude.tail flines
flines <- T.lines <$> I.readFile filename
--
-- ERROR!
hs_reader.hs:25:12:
parse error on input ‘<-’
Perhaps this statement should be within a 'do' block?
Failed, modules loaded: none.
问题(4)我和一些单子一起工作。这些家伙中的一个是否适用于易于理解的方式? &gt;&gt; =或&gt; =&gt; ?
答案 0 :(得分:2)
(1)获取[IO [(Text,Text)]],因为您映射了Int - &gt; IO [(Text,Text)]在[Int]上。你想要mapM。
(2)!!是一种气味。我会得到立即制作一个完整的清单,如果你真的想提供Ints之后你仍然可以使用!!在通话现场:
flines <- T.lines <$> I.readFile filename
(3)>>=
是monadic绑定,你不能只在where子句中这样做,你可以在do块中执行它的唯一原因是因为那些被置于readCSV :: FilePath -> IO [[(Text,Text)]]
readCSV filename =
(T.lines <$> I.readFile filename) >>= \(headers : body) ->
return $ map (Prelude.zip (splitOn "," headers) . splitOn ",") body
(4)这就是看起来像是什么样的:
>=>
由于文件名仅在第一行末尾使用一次,因此实际上可以使用readCSV :: FilePath -> IO [[(Text,Text)]]
readCSV =
fmap T.lines . I.readFile >=> \(headers : body) ->
return $ map (Prelude.zip (splitOn "," headers) . splitOn ",") body
编写:
>=>
由于最后一行仅使用了返回,因此我们甚至不需要fmap
- readCSV :: FilePath -> IO [[(Text,Text)]]
readCSV =
fmap ( (\(headers : body) -> map (Prelude.zip (splitOn "," headers) . splitOn ",") body)
. T.lines)
. I.readFile
就足够了。
parseCSV :: Text -> [[(Text, Text)]]
parseCSV =
(\(headers : body) -> map (Prelude.zip (splitOn "," headers) . splitOn ",") body)
. T.lines
其中哪一个更具可读性,当然完全是另一个问题。
编辑:最后一个建议进一步重构:
main :: IO ()
main = do
[filename, field] <- getArgs
csv <- parseCSV <$> I.readFile filename
print $ traverse (lookup field) csv
然后你就像使用它一样:
d = xmltodict.parse(s, force_list={'car'})