查找重复的二进制文件(.lib,.bin)

时间:2013-02-06 18:08:56

标签: algorithm gcc compiler-construction diff decompiling

场景是: 虽然源代码没有改变,但我发现编译后的lib / bin文件有所不同,虽然它是由相同的编译器和相同的依赖项编译的。

由于它不是文本文件,我被剥夺了想法,因为我们不能使用levenshtein距离或模式匹配。

另一个想法是,如果,我能够在源文件中添加一些盐,以便在已编译的二进制文件中可以检测到它?

任何想法都会很棒,因为我有一个巨大的libs,我担心机器学习几乎不可能实现。

1 个答案:

答案 0 :(得分:3)

使用objdump转储两个二进制文件,并将结果与​​文本比较工具进行比较。我不知道你需要检查哪些部分,但我猜想那些改变的部分(因而不应该包括在内)是:.gnu.hash,.gnu_debuglink。

$ objdump -h /bin/sh

/bin/sh:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .interp       0000001c  0000000000400238  0000000000400238  00000238  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.ABI-tag 00000020  0000000000400254  0000000000400254  00000254  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .note.gnu.build-id 00000024  0000000000400274  0000000000400274  00000274  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .gnu.hash     000036f8  0000000000400298  0000000000400298  00000298  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .dynsym       0000cbd0  0000000000403990  0000000000403990  00003990  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynstr       000083cf  0000000000410560  0000000000410560  00010560  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .gnu.version  000010fc  0000000000418930  0000000000418930  00018930  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .gnu.version_r 000000b0  0000000000419a30  0000000000419a30  00019a30  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .rela.dyn     000000c0  0000000000419ae0  0000000000419ae0  00019ae0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .rela.plt     000012f0  0000000000419ba0  0000000000419ba0  00019ba0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 .init         00000018  000000000041ae90  000000000041ae90  0001ae90  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .plt          00000cb0  000000000041aeb0  000000000041aeb0  0001aeb0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .text         0008f088  000000000041bb60  000000000041bb60  0001bb60  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .fini         0000000e  00000000004aabe8  00000000004aabe8  000aabe8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 14 .rodata       0001d790  00000000004aac00  00000000004aac00  000aac00  2**5
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 15 .eh_frame_hdr 00003cdc  00000000004c8390  00000000004c8390  000c8390  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 16 .eh_frame     00013a0c  00000000004cc070  00000000004cc070  000cc070  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 17 .ctors        00000010  00000000006dfe08  00000000006dfe08  000dfe08  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 18 .dtors        00000010  00000000006dfe18  00000000006dfe18  000dfe18  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 19 .jcr          00000008  00000000006dfe28  00000000006dfe28  000dfe28  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 20 .dynamic      000001b0  00000000006dfe30  00000000006dfe30  000dfe30  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 21 .got          00000008  00000000006dffe0  00000000006dffe0  000dffe0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 22 .got.plt      00000668  00000000006dffe8  00000000006dffe8  000dffe8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 23 .data         00008430  00000000006e0660  00000000006e0660  000e0660  2**5
                  CONTENTS, ALLOC, LOAD, DATA
 24 .bss          00005b88  00000000006e8aa0  00000000006e8aa0  000e8a90  2**5
                  ALLOC
 25 .gnu_debuglink 0000000c  0000000000000000  0000000000000000  000e8a90  2**0
                  CONTENTS, READONLY