我目前正试图弄清楚如何在Haskell中列出(gzipped)TAR档案。 Codec.Archive.Tar
似乎是完成任务的正确选择,但我无法弄清map
entryPath
对Entries
幺半群的影响。
假设TAR包含条目(仅文件)a.txt, b.txt, c.txt
并命名为foo.tar.gz
。这是我读取文件的代码:
import qualified Codec.Archive.Tar as Tar
import qualified Data.ByteString.Lazy as BS
import qualified Codec.Compression.GZip as GZip
foldEntryToPath :: Tar.Entry -> [String] -> [String]
foldEntryToPath entry list = list ++ [show $ Tar.entryPath entry]
-- Converts TAR errors to a string.
entryFailMapper :: String -> [String]
entryFailMapper err = [err]
main = do
fileContent <- fmap GZip.decompress $ BS.readFile "foo.tar.gz"
entries <- fmap Tar.read fileContent :: Tar.Entries
-- Here I don't know how to correctly apply fmap
entryPaths <- Tar.foldEntries foldEntryToPath [] entryFailMapper entries :: [String]
-- This should print ["a.txt", "b.txt", "c.txt"]
print entryPaths
以下是runghc
打印的错误:
readtar.hs:14:49:
Expecting one more argument to `Tar.Entries'
In an expression type signature: Tar.Entries
In a stmt of a 'do' block:
entries <- fmap Tar.read fileContent :: Tar.Entries
In the expression:
do { fileContent <- fmap GZip.decompress
$ BS.readFile "foo.tar.gz";
entries <- fmap Tar.read fileContent :: Tar.Entries;
entryPaths <- Tar.foldEntries
foldEntryToPath [] (\ x -> [...]) entries ::
[String];
print entryPaths }
到目前为止,我对Haskell知之甚少,但通过阅读the docs,我不知道为什么Tar.Entries
是一个类型类(当它是expecting n more arguments to <type>
时,它是正确的术语吗?)或者使用的正确类型是什么。
任何帮助将不胜感激!
答案 0 :(得分:1)
我认为foldEntryToPath
需要修复:
foldEntryToPath :: Tar.Entry -> [String] -> [String]
foldEntryToPath entry list = (show $ Tar.entryPath entry) : list
在main
:
fileContent <- fmap GZip.decompress $ BS.readFile "foo.tar.gz"
let entries = Tar.read fileContent
let entryPaths = Tar.foldEntries foldEntryToPath [] entryFailMapper entries
print entryPaths
答案 1 :(得分:1)
随着一些摆弄,我现在有一个完整的工作示例。
其中一个主要问题是foldr
的行为Tar.foldEntries
。实际上,我有一个包含数百万条目的~25GB TAR文件。有关为什么这是一个坏主意的信息,请参阅the HaskellWiki。 (注意:高效不是问题,但我认为foldEntries
- 免费解决方案对于这个特定的用例更好。
因此我编写了自己的递归Tar.Entries -> [String]
映射函数。即使错误目前处理不当,也应该提供一个良好的起点。
import qualified Codec.Archive.Tar as Tar
import qualified Data.ByteString.Lazy as BS
import qualified Codec.Compression.GZip as GZip
entriesToPaths :: Tar.Entries Tar.FormatError -> [String]
entriesToPaths (Tar.Next entry entries) = [Tar.entryPath entry] ++ entriesToPaths entries
entriesToPaths Tar.Done = [] :: [String]
entriesToPaths (Tar.Fail e) = ["Error"]
main = do
fileContent <- fmap GZip.decompress $ BS.readFile "foo.tar.gz"
let entries = Tar.read fileContent
let entryPaths = entriesToPaths entries
-- This should print ["a.txt", "b.txt", "c.txt"]
print entryPaths