Question

我正在尝试编写一个工具，将一些包含结构的C代码作为输入。它将编译代码，然后查找并输出编译器决定添加到其中的结构的任何填充的大小和偏移量。对于使用offsetof，sizeof和一些附加功能的已知结构，这是非常简单的，但我无法想出一种简单的方法来自动为任何输入结构执行此操作。

如果我知道如何遍历结构中的所有元素，我想我可以毫无问题地编写工具，但据我所知，没有办法做到这一点。我希望一些StackOverflow人会知道一种方式。但是，我并没有停留在我的方法中，而且我当然愿意接受在结构中寻找填充的任何替代方法。

Answer 1

这不是pahole的作用吗？

Answer 2

假设您有以下module.h：

typedef void (*handler)(void);

struct foo {
  char a;
  double b;
  int c;
};

struct bar {
  float y;
  short z;
};

生成unpack模板的Perl程序从习惯性的前提开始：

#! /usr/bin/perl

use warnings;
use strict;

sub usage { "Usage: $0 header\n" }

使用structs，我们将标头提供给ctags及其输出收集结构成员。结果是一个哈希，其键是结构的名称，其值是[$member_name, $type]形式的对的数组。

请注意，它只处理几种C类型。

sub structs {
  my($header) = @_;

  open my $fh, "-|", "ctags", "-f", "-", $header
    or die "$0: could not start ctags";

  my %struct;
  while (<$fh>) {
    chomp;
    my @f = split /\t/;
    next unless @f >= 5 &&
                $f[3] eq "m" &&
                $f[4] =~ /^struct:(.+)/;

    my $struct = $1;
    die "$0: unknown type in $f[2]"
      unless $f[2] =~ m!/\^\s*(float|char|int|double|short)\b!;

    # [ member-name => type ]
    push @{ $struct{$struct} } => [ $f[0] => $1 ];
  }

  wantarray ? %struct : \%struct;
}

假设标头可以单独包含，generate_source生成一个C程序，用于打印标准输出的偏移量，用虚拟值填充结构，并将原始结构写入标准输出，前面是各自的大小以字节为单位。

sub generate_source {
  my($struct,$header) = @_;

  my $path = "/tmp/my-offsets.c";
  open my $fh, ">", $path
    or die "$0: open $path: $!";

  print $fh <<EOStart;
#include <stdio.h>
#include <stddef.h>
#include <$header>
void print_buf(void *b, size_t n) {
  char *c = (char *) b;
  printf("%zd\\n", n);
  while (n--) {
    fputc(*c++, stdout);
  }
}

int main(void) {
EOStart

  my $id = "a1";
  my %id;
  foreach my $s (sort keys %$struct) {
    $id{$s} = $id++;
    print $fh "struct $s $id{$s};\n";
  }

  my $value = 0;
  foreach my $s (sort keys %$struct) {
    for (@{ $struct->{$s} }) {
      print $fh <<EOLine;
printf("%lu\\n", offsetof(struct $s,$_->[0]));
$id{$s}.$_->[0] = $value;
EOLine
      ++$value;
    }
  }

  print $fh qq{printf("----\\n");\n};

  foreach my $s (sort keys %$struct) {
    print $fh "print_buf(&$id{$s}, sizeof($id{$s}));\n";
  }
  print $fh <<EOEnd;
  return 0;
}
EOEnd

  close $fh or warn "$0: close $path: $!";
  $path;
}

为unpack生成模板，其中参数$members是structs返回的哈希值，该值已使用偏移量增加（即，arrayrefs形式[$member_name, $type, $offset]：

sub template {
  my($members) = @_;

  my %type2tmpl = (
    char => "c",
    double => "d",
    float => "f",
    int => "i!",
    short => "s!",
  );

  join " " =>
  map '@![' . $_->[2] . ']' . $type2tmpl{ $_->[1] } =>
  @$members;
}

最后，我们到达主程序，第一个任务是生成和编译C程序：

die usage unless @ARGV == 1;
my $header = shift;

my $struct = structs $header;
my $src    = generate_source $struct, $header;

(my $cmd = $src) =~ s/\.c$//;
system("gcc -I`pwd` -o $cmd $src") == 0
  or die "$0: gcc failed";

现在我们读取生成的程序的输出并解码结构：

my @todo = map @{ $struct->{$_} } => sort keys %$struct;

open my $fh, "-|", $cmd
  or die "$0: start $cmd failed: $!";
while (<$fh>) {
  last if /^-+$/;
  chomp;
  my $m = shift @todo;
  push @$m => $_;
}

if (@todo) {
  die "$0: unfilled:\n" .
      join "" => map "  - $_->[0]\n", @todo;
}

foreach my $s (sort keys %$struct) {
  chomp(my $length = <$fh> || die "$0: unexpected end of input");
  my $bytes = read $fh, my($buf), $length;
  if (defined $bytes) {
    die "$0: unexpected end of input" unless $bytes;
    print "$s: @{[unpack template($struct->{$s}), $buf]}\n";
  }
  else {
    die "$0: read: $!";
  }
}

输出：

$ ./unpack module.h 
bar: 0 1
foo: 2 3 4

作为参考，为module.h生成的C程序是

#include <stdio.h>
#include <stddef.h>
#include <module.h>
void print_buf(void *b, size_t n) {
  char *c = (char *) b;
  printf("%zd\n", n);
  while (n--) {
    fputc(*c++, stdout);
  }
}

int main(void) {
struct bar a1;
struct foo a2;
printf("%lu\n", offsetof(struct bar,y));
a1.y = 0;
printf("%lu\n", offsetof(struct bar,z));
a1.z = 1;
printf("%lu\n", offsetof(struct foo,a));
a2.a = 2;
printf("%lu\n", offsetof(struct foo,b));
a2.b = 3;
printf("%lu\n", offsetof(struct foo,c));
a2.c = 4;
printf("----\n");
print_buf(&a1, sizeof(a1));
print_buf(&a2, sizeof(a2));
  return 0;
}

Answer 3

我更喜欢读取和写入缓冲区，然后让函数从缓冲区加载结构成员。这比直接读取结构或使用memcpy更便携。此算法还可以解除对编译器填充的担忧，也可以调整以处理Endianess。

正确而强大的程序比压缩二进制数据的时间更有价值。

Answer 4

黑客攻击Convert::Binary::C。

Answer 5

您可以使用Exuberant Ctags来解析源文件，而不是使用CPAN模块或自行破解某些内容。例如，对于以下代码：

typedef struct _foo {
    int a;
    int b;
} foo;

ctags会发出以下信息：

_foo    x.c     /^typedef struct _foo {$/;"     s                               file:
a       x.c     /^    int a;$/;"                m       struct:_foo             file:
b       x.c     /^    int b;$/;"                m       struct:_foo             file:
foo     x.c     /^} foo;$/;"                    t       typeref:struct:_foo     file:

第一，第四和第五列应足以确定存在哪些结构类型以及它们的成员是什么。您可以使用该信息生成一个C程序，该程序确定每种结构类型的填充量。

Answer 6

您可以尝试pstruct。

我从未使用它，但我正在寻找一些方法，你可以使用刺，这听起来像是符合条件。

如果没有，我会建议其他方法来解析刺伤信息。

Answer 7

让您的工具解析结构定义以查找字段的名称，然后生成打印结构填充描述的C代码，最后编译并运行该C代码。第二部分的Perl代码示例：

printf "const char *const field_names[] = {%s};\n",
       join(", ", map {"\"$_\""} @field_names);
printf "const size_t offsets[] = {%s, %s};\n",
       join(", ", map {"offsetof(struct $struct_name, $_)"} @field_names),
       "sizeof(struct $struct_name)";
print <<'EOF'
for (i = 0; i < sizeof(field_names)/sizeof(*field_names); i++) {
    size_t padding = offsets[i+1] - offsets[i];
    printf("After %s: %zu bytes of padding\n", field_names[i], padding);
}
EOF

C非常难以解析，但您只对该语言的一小部分感兴趣，并且听起来您对源文件有一定的控制权，因此一个简单的解析器应该可以解决问题。搜索CPAN会将Devel::Tokenizer::C和一些C::模块作为候选人（除了他们的名字之外我对他们一无所知）。如果你真的需要一个准确的C语法分析器，那就有Cil，但你必须在Ocaml中编写你的分析。

Answer 8

如果您可以访问Visual C ++，则可以添加以下编译指示以使编译器吐出添加了填充的位置和数量：

#pragma warning(enable : 4820)

此时你可能只是消耗cl.exe的输出并参加派对。

Answer 9

我不相信任何通用设施都存在于C中的内省/反思。这就是Java或C＃的用途。

Answer 10

没有C ++语言功能来遍历结构的成员，所以我认为你运气不好。

您可以使用宏来减少一些锅炉板，但我认为您明确地指定了所有成员。

找到结构中填充的大小和位置的方法？

10 个答案: