使用pipes-csv从csv文件中读取第一行

时间:2016-12-20 03:57:33

标签: csv haskell haskell-pipes

我正在使用pipes-csv库读取csv文件。我想阅读第一行并稍后阅读其余内容。不幸的是在Pipes.Prelude.head函数返回后。管道正在以某种方式关闭。有没有办法先读取csv的头部,然后再阅读其余部分。

import qualified Data.Vector as V
import Pipes
import qualified Pipes.Prelude as P
import qualified System.IO as IO
import qualified Pipes.ByteString as PB
import qualified Data.Text as Text
import qualified Pipes.Csv as PCsv
import Control.Monad (forever)

showPipe :: Proxy () (Either String (V.Vector Text.Text)) () String IO b
showPipe = forever $ do
    x::(Either String (V.Vector Text.Text)) <- await
    yield $ show x


main :: IO ()
main = do
  IO.withFile "./test.csv"
              IO.ReadMode
              (\handle -> do
                  let producer = (PCsv.decode PCsv.NoHeader (PB.fromHandle handle))
                  headers <- P.head producer
                  putStrLn "Header"
                  putStrLn $ show headers
                  putStrLn $ "Rows"
                  runEffect ( producer>->
                              (showPipe) >->
                              P.stdoutLn)
               )

如果我们不首先阅读标题,我们可以毫无问题地阅读整个csv:

main :: IO ()
main = do
  IO.withFile "./test.csv"
              IO.ReadMode
              (\handle -> do
                  let producer = (PCsv.decode PCsv.NoHeader (PB.fromHandle handle))
                  putStrLn $ "Rows"
                  runEffect ( producer>->
                              (showPipe) >->
                              P.stdoutLn)
               )

1 个答案:

答案 0 :(得分:1)

Pipes.Csv有处理标题的材料,但我认为这个问题实际上是在寻找Pipes.awaitPipes.next更复杂的用法。首先next

>>> :t Pipes.next 
Pipes.next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r))

next是检查生产者的基本方法。它有点像列表上的模式匹配。对于列表,两种可能性为[]x:xs - 此处为Left ()Right (headers, rows)。后者是您正在寻找的。当然,需要采取行动(此处为IO)以获取该行动:

main :: IO ()
main = do
  handle <- IO.openFile  "./test.csv" IO.ReadMode
  let producer :: Producer (V.Vector Text.Text) IO ()
      producer = PCsv.decode PCsv.NoHeader (PB.fromHandle handle)  >-> P.concat
  e <- next producer
  case e of
    Left () -> putStrLn "No lines!"
    Right (headers, rows) -> do
      putStrLn "Header"
      print headers
      putStrLn $ "Rows"
      runEffect ( rows >-> P.print)
  IO.hClose handle

由于Either值会分散注意力,因此我会删除Left值 - 不会解析的行 - P.concat

next不在管道内行动,而是直接在Producer上行动,它将其视为一种有效的列表&#34;最后的最终返回值。我们上面得到的特殊效果当然可以通过await实现,它在管道中起作用。我可以用它拦截管道中出现的第一个项目,根据它做一些IO,然后转发剩余的元素:

main :: IO ()
main = do
  handle <- IO.openFile  "./grades.csv" IO.ReadMode
  let producer :: Producer (V.Vector Text.Text) IO ()
      producer = PCsv.decode PCsv.NoHeader (PB.fromHandle handle)  >-> P.concat
      handleHeader :: Pipe (V.Vector Text.Text) (V.Vector Text.Text) IO ()
      handleHeader = do
        headers <- await  -- intercept first value
        liftIO $ do       -- use it for IO
          putStrLn "Header"
          print headers
          putStrLn $ "Rows"
        cat               -- pass along all later values
  runEffect (producer >-> handleHeader >-> P.print)
  IO.hClose handle

区别在于,如果producer为空,我就无法宣布这一点,就像我在上一个程序中使用No lines!一样。

请注意,showPipe可以定义为P.map show,或者简单地定义为P.show(但使用您添加的专用类型。)