如何比较两个tarball的内容

时间:2009-06-23 03:50:59

标签: linux compare tar compression

我想告诉两个tarball文件在文件名和文件内容方面是否包含相同的文件,不包括日期,用户,组等元数据。

但是,有一些限制: 首先,我无法控制在制作tar文件时是否包含元数据,实际上,tar文件总是包含元数据,因此直接对两个tar文件进行区分不起作用。 其次,由于一些tar文件太大,以至于我无法将它们解压缩到临时目录中并逐个区分包含的文件。 (我知道如果我可以将file1.tar解压缩到file1 /中,我可以通过在文件/中调用'tar -dvf file2.tar'来比较它们。但通常我甚至不能解决其中一个问题)

知道如何比较两个tar文件吗?如果可以在SHELL脚本中完成它会更好。或者,有没有办法获得每个子文件的校验和而不实际解压缩tarball?

谢谢,

13 个答案:

答案 0 :(得分:10)

还可以pkgdiff查看包之间的差异(检测添加/删除/重命名的文件和更改的内容,如果没有更改则存在零代码):

pkgdiff PKG-0.tgz PKG-1.tgz

enter image description here

enter image description here

答案 1 :(得分:9)

您是否正在控制这些tar文件的创建?
如果是这样,最好的技巧是创建MD5校验和并将其存储在存档本身的文件中。然后,当您想要比较两个文件时,您只需提取此校验和文件并进行比较。


如果您能够只提取一个tar文件您可以使用--diff tar选项来查找与其他tar文件的内容。


如果您只需比较文件名及其尺寸,那么还有一个粗略的技巧
请记住,这并不能保证其他文件是相同的!

执行tar tvf列出每个文件的内容,并将输出存储在两个不同的文件中。然后,切除除文件名和大小列之外的所有内容。最好也对两个文件进行排序。然后,只需在两个列表之间进行文件差异。

请记住,最后一个方案并不真正做校验和。

示例tar和输出(在此示例中,所有文件的大小均为零)。

$ tar tvfj pack1.tar.bz2
drwxr-xr-x user/group 0 2009-06-23 10:29:51 dir1/
-rw-r--r-- user/group 0 2009-06-23 10:29:50 dir1/file1
-rw-r--r-- user/group 0 2009-06-23 10:29:51 dir1/file2
drwxr-xr-x user/group 0 2009-06-23 10:29:59 dir2/
-rw-r--r-- user/group 0 2009-06-23 10:29:57 dir2/file1
-rw-r--r-- user/group 0 2009-06-23 10:29:59 dir2/file3
drwxr-xr-x user/group 0 2009-06-23 10:29:45 dir3/

生成有序名称/大小列表的命令

$ tar tvfj pack1.tar.bz2 | awk '{printf "%10s %s\n",$3,$6}' | sort -k 2
0 dir1/
0 dir1/file1
0 dir1/file2
0 dir2/
0 dir2/file1
0 dir2/file3
0 dir3/

您可以采取两个这样的排序列表并区分它们 如果适合您,您还可以使用日期和时间列。

答案 2 :(得分:5)

我意识到这是一个迟到的回复,但我在尝试实现同样的事情时遇到了这个问题。我已经实现的解决方案将tar输出到stdout,并将其传递给您选择的任何哈希:

tar -xOzf archive.tar.gz | sort | sha1sum

请注意,参数的顺序很重要;特别是O,表示使用标准输出。

答案 3 :(得分:4)

这是我的变体,它也在检查unix权限:

仅当文件名小于200个字符时才有效。

diff <(tar -tvf 1.tar | awk '{printf "%10s %200s %10s\n",$3,$6,$1}'|sort -k2) <(tar -tvf 2.tar|awk '{printf "%10s %200s %10s\n",$3,$6,$1}'|sort -k2)

答案 4 :(得分:3)

tarsum几乎就是你所需要的。获取其输出,通过排序运行以获得每个排序相同,然后将两者与diff进行比较。这应该会让你获得一个基本的实现,并且通过修改Python代码来完成整个工作就可以很容易地将这些步骤引入主程序。

答案 5 :(得分:2)

tardiff您要找的是什么?它是“一个简单的perl脚本”,“比较两个tarball的内容并报告它们之间发现的任何差异。”

答案 6 :(得分:1)

只是把它扔在那里,因为上述解决方案都不能满足我的需要。

此函数获取与给定路径匹配的所有文件路径的 md5 哈希值的 md5 哈希值。如果哈希值相同,则文件层次结构和文件列表相同。

我知道它的性能不如其他人,但它提供了我需要的确定性。

PATH_TO_CHECK="some/path"
for template in $(find build/ -name '*.tar'); do
    tar -xvf $template --to-command=md5sum | 
        grep $PATH_TO_CHECK -A 1 | 
        grep -v $PATH_TO_CHECK | 
        awk '{print $1}' | 
        md5sum | 
        awk "{print \"$template\",\$1}"
done

*注意:无效路径只会返回任何内容。

答案 7 :(得分:0)

如果没有提取档案但也不需要差异,请尝试diff -q 选项:

diff -q 1.tar 2.tar

这个安静的结果将是“1.tar 2.tar不同”或没有,如果没有差异。

答案 8 :(得分:0)

有一个名为archdiff的工具。它基本上是一个可以查看档案的perl脚本。

Takes two archives, or an archive and a directory and shows a summary of the
differences between them.

答案 9 :(得分:0)

我有一个类似的问题,我通过python解决它,这是代码。 ps:虽然这个代码用于比较两个zipball的内容,但它与tarball类似,希望我能帮到你

import zipfile
import os,md5
import hashlib
import shutil

def decompressZip(zipName, dirName):
    try:
        zipFile = zipfile.ZipFile(zipName, "r")
        fileNames = zipFile.namelist()
        for file in fileNames:
            zipFile.extract(file, dirName)
        zipFile.close()
        return fileNames
    except Exception,e:
        raise Exception,e

def md5sum(filename):
    f = open(filename,"rb")
    md5obj = hashlib.md5()
    md5obj.update(f.read())
    hash = md5obj.hexdigest()
    f.close()
    return str(hash).upper()

if __name__ == "__main__":
    oldFileList = decompressZip("./old.zip", "./oldDir")
    newFileList = decompressZip("./new.zip", "./newDir")

    oldDict = dict()
    newDict = dict()

    for oldFile in oldFileList:
        tmpOldFile = "./oldDir/" + oldFile
        if not os.path.isdir(tmpOldFile):
            oldFileMD5 = md5sum(tmpOldFile)
            oldDict[oldFile] = oldFileMD5

    for newFile in newFileList:
        tmpNewFile = "./newDir/" + newFile
        if not os.path.isdir(tmpNewFile):
            newFileMD5 = md5sum(tmpNewFile)
            newDict[newFile] = newFileMD5

    additionList = list()
    modifyList = list()

    for key in newDict:
        if not oldDict.has_key(key):
            additionList.append(key)
        else:
            newMD5 = newDict[key]
            oldMD5 = oldDict[key]
            if not newMD5 == oldMD5:
            modifyList.append(key)

    print "new file lis:%s" % additionList
    print "modified file list:%s" % modifyList

    shutil.rmtree("./oldDir")
    shutil.rmtree("./newDir")

答案 10 :(得分:0)

还有diffoscope,它更通用,可以递归比较事物(包括各种格式)。

pip install diffoscope

答案 11 :(得分:0)

我提议用Go语言编写的 gtarsum ,这意味着它将是一个自治可执行文件(不需要Python或其他执行环境)。

[ 59%] Linking CXX executable ../../mconvert
/usr/bin/ld: CMakeFiles/mconvert.dir/mconvert.cpp.o: in function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) [clone .isra.19]':
mconvert.cpp:(.text+0x86): undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)'
/usr/bin/ld: mconvert.cpp:(.text+0xbc): undefined reference to `std::__throw_logic_error(char const*)'
/usr/bin/ld: CMakeFiles/mconvert.dir/mconvert.cpp.o: in function `usage_and_exit()':
mconvert.cpp:(.text+0xca): undefined reference to `std::cerr'
/usr/bin/ld: mconvert.cpp:(.text+0xd8): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)'
/usr/bin/ld: mconvert.cpp:(.text+0xe6): undefined reference to `std::cerr'
/usr/bin/ld: mconvert.cpp:(.text+0xf0): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)'
/usr/bin/ld: mconvert.cpp:(.text+0xf7): undefined reference to `std::cerr'
/usr/bin/ld: mconvert.cpp:(.text+0x103): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)'
/usr/bin/ld: CMakeFiles/mconvert.dir/mconvert.cpp.o: in function `main':
mconvert.cpp:(.text.startup+0xff): undefined reference to `operator delete(void*)'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x116): undefined reference to `std::cerr'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x120): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x127): undefined reference to `std::cerr'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x12f): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x13e): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x19d): undefined reference to `operator delete(void*)'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x1b0): undefined reference to `std::cerr'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x1ba): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x1c1): undefined reference to `std::cerr'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x1c9): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x1ee): undefined reference to `operator delete(void*)'
/usr/bin/ld: CMakeFiles/mconvert.dir/mconvert.cpp.o: in function `_GLOBAL__sub_I__Z14usage_and_exitv':
mconvert.cpp:(.text.startup+0x21c): undefined reference to `std::ios_base::Init::Init()'
/usr/bin/ld: mconvert.cpp:(.text.startup+0x223): undefined reference to `std::ios_base::Init::~Init()'
/usr/bin/ld: CMakeFiles/mconvert.dir/mconvert.cpp.o:(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `__gxx_personality_v0'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::ctype<char>::_M_widen_init() const'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::_Rb_tree_decrement(std::_Rb_tree_node_base*)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::ios_base::~ios_base()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `operator new[](unsigned long)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `typeinfo for std::bad_alloc'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::_Rb_tree_increment(std::_Rb_tree_node_base*)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `__cxa_end_catch'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_ostringstream()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::basic_ofstream<char, std::char_traits<char> >::~basic_ofstream()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `__cxa_allocate_exception'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::_Rb_tree_increment(std::_Rb_tree_node_base const*)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::bad_alloc::~bad_alloc()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `atan2'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `vtable for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::ios_base::ios_base()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::cout'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `vtable for std::basic_ios<char, std::char_traits<char> >'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::locale::locale()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::basic_filebuf<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `typeinfo for float'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::ostream& std::ostream::_M_insert<double>(double)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `VTT for std::basic_ofstream<char, std::char_traits<char> >'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `__cxa_guard_release'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::ostream::flush()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `acos'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `typeinfo for double'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `typeinfo for bool'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::ostream::put(char)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::basic_filebuf<char, std::char_traits<char> >::close()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `vtable for std::basic_streambuf<char, std::char_traits<char> >'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `__cxa_pure_virtual'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::basic_filebuf<char, std::char_traits<char> >::~basic_filebuf()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `__cxa_throw_bad_array_new_length'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `__dynamic_cast'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `sin'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::__throw_bad_cast()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `vtable for std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `operator delete[](void*)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `sqrtf'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::clog'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::basic_ios<char, std::char_traits<char> >::clear(std::_Ios_Iostate)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `vtable for std::bad_alloc'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `tan'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `__cxa_begin_catch'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `__cxa_rethrow'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `__cxa_throw'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::rfind(char const*, unsigned long, unsigned long) const'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::locale::~locale()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `vtable for std::basic_ofstream<char, std::char_traits<char> >'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `cos'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `typeinfo for int'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `vtable for __cxxabiv1::__si_class_type_info'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::basic_filebuf<char, std::char_traits<char> >::basic_filebuf()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `operator new(unsigned long)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::__basic_file<char>::~__basic_file()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::__throw_out_of_range_fmt(char const*, ...)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `sqrt'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `sincosf'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `vtable for std::basic_filebuf<char, std::char_traits<char> >'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::__throw_length_error(char const*)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `vtable for __cxxabiv1::__class_type_info'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::ostream::operator<<(int)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `std::__throw_bad_alloc()'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `vtable for __cxxabiv1::__vmi_class_type_info'
/usr/bin/ld: ../../libpmp.so.1.2.1: undefined reference to `__cxa_guard_acquire'
collect2: error: ld returned 1 exit status
make[2]: *** [src/apps/CMakeFiles/mconvert.dir/build.make:85: mconvert] Error 1
make[1]: *** [CMakeFiles/Makefile2:544: src/apps/CMakeFiles/mconvert.dir/all] Error 2
make: *** [Makefile:163: all] Error 2

它将读取一个tar文件,并且:

  • 按字母顺序对文件列表进行排序,
  • 为每个文件内容计算SHA256,
  • 将这些散列连接成一个大字符串
  • 计算该字符串的SHA256

结果是基于文件列表及其内容的tar文件的“全局哈希”。

它可以比较多个tar文件,如果相同则返回0,否则返回1。

答案 12 :(得分:-1)

可以使用简单的脚本:

#!/usr/bin/env bash
set -eu

tar1=$1
tar2=$2
shift 2
tar_opts=("$@")

tmp1=`mktemp -d`
_trap="rm -r "$tmp1"; ${_trap:-}" && trap "$_trap" EXIT
tar xf "$tar1" -C "$tmp1"

tmp2=`mktemp -d`
_trap="rm -r "$tmp2"; ${_trap:-}" && trap "$_trap" EXIT
tar xf "$tar2" -C "$tmp2"

diff -ur "${tar_opts[@]:+${tar_opts[@]}}" "$tmp1" "$tmp2"

用法:

diff-tars.sh TAR1 TAR2 [DIFF_OPTS]