Question

我有一个tar文件，其中包含多个文件。我需要编写一个python脚本来读取文件的内容并给出总字符数，包括字母总数，空格，换行符，所有内容，而不用解压缩tar文件。

Answer 1

你可以使用getmembers（）

>>> import  tarfile
>>> tar = tarfile.open("test.tar")
>>> tar.getmembers()

之后，您可以使用extractfile（）将成员提取为文件对象。只是一个例子

import tarfile,os
import sys
os.chdir("/tmp/foo")
tar = tarfile.open("test.tar")
for member in tar.getmembers():
    f=tar.extractfile(member)
    content=f.read()
    print "%s has %d newlines" %(member, content.count("\n"))
    print "%s has %d spaces" % (member,content.count(" "))
    print "%s has %d characters" % (member, len(content))
    sys.exit()
tar.close()

在上例中使用文件对象“f”，您可以使用read（），readlines（）等。

Answer 2

您需要使用tarfile模块。具体来说，您使用类TarFile的实例来访问该文件，然后使用TarFile.getnames（）访问名称

 |  getnames(self)
 |      Return the members of the archive as a list of their names. It has
 |      the same order as the list returned by getmembers().

如果您想要阅读内容，那么您可以使用此方法

 |  extractfile(self, member)
 |      Extract a member from the archive as a file object. `member' may be
 |      a filename or a TarInfo object. If `member' is a regular file, a
 |      file-like object is returned. If `member' is a link, a file-like
 |      object is constructed from the link's target. If `member' is none of
 |      the above, None is returned.
 |      The file-like object is read-only and provides the following
 |      methods: read(), readline(), readlines(), seek() and tell()

Answer 3

@ stefano-borini提到的方法的实现通过文件名访问tar档案成员，如此

#python3
myFile = myArchive.extractfile( 
    dict(zip(
        myArchive.getnames(), 
        myArchive.getmembers()
    ))['path/to/file'] 
).read()`

现金：

https://stackoverflow.com/a/209854/1695680

dict(zip( 来自https://stackoverflow.com/a/2018523/1695680
tarfile.getnames
此外，根据我的用途，从缓冲区How to construct a TarFile object in memory from byte buffer in Python 3?

Answer 4

您可以使用tarfile.list（）例如：

filename = "abc.tar.bz2"
with open( filename , mode='r:bz2') as f1:
    print(f1.list())

获取这些数据之后。您可以操纵该输出或将其输出到文件中，并执行您的任何要求。

在python脚本中读取tar文件内容而不解压缩它

4 个答案: