在Haskell中列出TAR存档

时间:2014-01-21 02:11:21

标签: haskell tar

我目前正试图弄清楚如何在Haskell中列出(gzipped)TAR档案。 Codec.Archive.Tar似乎是完成任务的正确选择,但我无法弄清map entryPathEntries幺半群的影响。

假设TAR包含条目(仅文件)a.txt, b.txt, c.txt并命名为foo.tar.gz。这是我读取文件的代码:

import qualified Codec.Archive.Tar as Tar
import qualified Data.ByteString.Lazy as BS
import qualified Codec.Compression.GZip as GZip

foldEntryToPath :: Tar.Entry -> [String] -> [String]
foldEntryToPath entry list = list ++ [show $ Tar.entryPath entry]

-- Converts TAR errors to a string.
entryFailMapper :: String -> [String]
entryFailMapper err = [err]

main = do
        fileContent <- fmap GZip.decompress $ BS.readFile "foo.tar.gz"
        entries <- fmap Tar.read fileContent :: Tar.Entries
        -- Here I don't know how to correctly apply fmap
        entryPaths <- Tar.foldEntries foldEntryToPath [] entryFailMapper entries :: [String]
        -- This should print ["a.txt", "b.txt", "c.txt"]
        print entryPaths

以下是runghc打印的错误:

readtar.hs:14:49:
Expecting one more argument to `Tar.Entries'
In an expression type signature: Tar.Entries
In a stmt of a 'do' block:
  entries <- fmap Tar.read fileContent :: Tar.Entries
In the expression:
  do { fileContent <- fmap GZip.decompress
                      $ BS.readFile "foo.tar.gz";
       entries <- fmap Tar.read fileContent :: Tar.Entries;
       entryPaths <- Tar.foldEntries
                       foldEntryToPath [] (\ x -> [...]) entries ::
                       [String];
       print entryPaths }

到目前为止,我对Haskell知之甚少,但通过阅读the docs,我不知道为什么Tar.Entries是一个类型类(当它是expecting n more arguments to <type>时,它是正确的术语吗?)或者使用的正确类型是什么。

任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:1)

我认为foldEntryToPath需要修复:

foldEntryToPath :: Tar.Entry -> [String] -> [String]
foldEntryToPath entry list = (show $ Tar.entryPath entry) : list

main

fileContent <- fmap GZip.decompress $ BS.readFile "foo.tar.gz"
let entries = Tar.read fileContent
let entryPaths = Tar.foldEntries foldEntryToPath [] entryFailMapper entries
print entryPaths

答案 1 :(得分:1)

随着一些摆弄,我现在有一个完整的工作示例。

其中一个主要问题是foldr的行为Tar.foldEntries。实际上,我有一个包含数百万条目的~25GB TAR文件。有关为什么这是一个坏主意的信息,请参阅the HaskellWiki。 (注意:高效不是问题,但我认为foldEntries - 免费解决方案对于这个特定的用例更好。

因此我编写了自己的递归Tar.Entries -> [String]映射函数。即使错误目前处理不当,也应该提供一个良好的起点。

import qualified Codec.Archive.Tar as Tar
import qualified Data.ByteString.Lazy as BS
import qualified Codec.Compression.GZip as GZip

entriesToPaths :: Tar.Entries Tar.FormatError -> [String]
entriesToPaths (Tar.Next entry entries) = [Tar.entryPath entry] ++ entriesToPaths entries
entriesToPaths Tar.Done = [] :: [String]
entriesToPaths (Tar.Fail e) = ["Error"]

main = do
        fileContent <- fmap GZip.decompress $ BS.readFile "foo.tar.gz"
        let entries = Tar.read fileContent
        let entryPaths = entriesToPaths entries
        -- This should print ["a.txt", "b.txt", "c.txt"]
        print entryPaths