libclang:遗漏了AST中的一些语句?

时间:2013-01-10 04:17:47

标签: clang abstract-syntax-tree libclang

我编写了一个测试程序(parse_ast.c)来解析一个c源文件(tt.c),看看libclang是如何工作的,输出是AST的层次结构:

这是测试文件:     

/* tt.c */                                    // line 1
#include <unistd.h>
#include <stdio.h>

typedef ssize_t (*write_fn_t)(int, const void *, size_t);

void indirect_write(write_fn_t write_fn) {    // line 7
    (*write_fn)(1, "indirect call\n", 14);
}

void direct_write() {                         // line 11
    write(1, "direct call\n", 12);            // line 12 mising in the ast?
}

int main() {                                  // line 15
    direct_write();
    indirect_write(write);                    // line 17 missing in the ast?

    return 0;
}

输出显示如下:

 ...
 ...
 inclusion directive at tt.c (2, 1) to (2, 20)
 inclusion directive at tt.c (3, 1) to (3, 19)
 TypedefDecl at tt.c (5, 1) to (5, 57)
 TypeRef at tt.c (5, 9) to (5, 16)
 ParmDecl at tt.c (5, 31) to (5, 35)
 ParmDecl at tt.c (5, 36) to (5, 49)
 ParmDecl at tt.c (5, 50) to (5, 56)
 FunctionDecl at tt.c (7, 1) to (9, 2)
 ParmDecl at tt.c (7, 21) to (7, 40)
  TypeRef at tt.c (7, 21) to (7, 31)
 CompoundStmt at tt.c (7, 42) to (9, 2)
  CallExpr at tt.c (8, 5) to (8, 42)
   UnexposedExpr at tt.c (8, 5) to (8, 16)
    ParenExpr at tt.c (8, 5) to (8, 16)
     UnaryOperator at tt.c (8, 6) to (8, 15)
      UnexposedExpr at tt.c (8, 7) to (8, 15)
       DeclRefExpr at tt.c (8, 7) to (8, 15)
   IntegerLiteral at tt.c (8, 17) to (8, 18)
   UnexposedExpr at tt.c (8, 20) to (8, 37)
    UnexposedExpr at tt.c (8, 20) to (8, 37)
     StringLiteral at tt.c (8, 20) to (8, 37)
   IntegerLiteral at tt.c (8, 39) to (8, 41)
 FunctionDecl at tt.c (11, 1) to (13, 2)
 CompoundStmt at tt.c (11, 21) to (13, 2)        <- XXX no line 12?
 FunctionDecl at tt.c (15, 1) to (20, 2)
 CompoundStmt at tt.c (15, 12) to (20, 2)
  CallExpr at tt.c (16, 5) to (16, 19)
   UnexposedExpr at tt.c (16, 5) to (16, 17)
    DeclRefExpr at tt.c (16, 5) to (16, 17)      <- XXX no line 17?
  ReturnStmt at tt.c (19, 5) to (19, 13)
   IntegerLiteral at tt.c (19, 12) to (19, 13)

我们可以看到三个函数(第7行的direct_write /第11行的indirect_write /第15行的main),大多数语句都可以在AST中找到,但我找不到任何代表语句的东西。第12行和第17行。有谁知道原因?

我正在使用debian 2.6.32 squeeze,在clang 3.1和3.2上测试过(从源代码编译)。

这是程序parse_ast.c:

#include <stddef.h>
#include <stdio.h>
#include <clang-c/Index.h>

enum CXChildVisitResult visit_fn(CXCursor cr, CXCursor parent,
        CXClientData client_data) {

    unsigned depth;
    unsigned line, column, offset;
    enum CXCursorKind kind;
    CXSourceRange extent;
    CXSourceLocation start, end;
    CXString kind_spelling, filename;
    CXFile file;

    depth = (unsigned)client_data;

    // print cursor kind
    kind = clang_getCursorKind(cr);
    kind_spelling = clang_getCursorKindSpelling(kind);
    fprintf(stdout, "%*s%s at", depth, " ", clang_getCString(kind_spelling));
    clang_disposeString(kind_spelling);

    // get extent
    extent = clang_getCursorExtent(cr);
    start = clang_getRangeStart(extent);
    end = clang_getRangeEnd(extent);

    // print start position
    clang_getExpansionLocation(start, &file, &line, &column, &offset);
    filename = clang_getFileName(file);
    fprintf(stdout, " %s (%u, %u) to", clang_getCString(filename), line,
            column);
    clang_disposeString(filename);

    // print end position
    clang_getExpansionLocation(end, &file, &line, &column, &offset);
    fprintf(stdout, " (%u, %u)\n", line, column);

    // recursive
    clang_visitChildren(cr, visit_fn, (CXClientData)(depth + 1));

    return CXChildVisit_Continue;

}

int main(int argc, const char * const *argv) {
    CXIndex Index = clang_createIndex(0, 0);
    CXTranslationUnit TU = clang_parseTranslationUnit(Index, NULL,
            argv, argc, 0, 0, CXTranslationUnit_DetailedPreprocessingRecord);

    clang_visitChildren(clang_getTranslationUnitCursor(TU),
            visit_fn, 0);
    clang_disposeTranslationUnit(TU);
    clang_disposeIndex(Index);

    return 0;
}

更新

问题是由于缺少头文件stddef.h,它在libclang的邮件列表中被回答http://clang-developers.42468.n3.nabble.com/libclang-missing-some-statements-in-the-AST-td4029641.html

2 个答案:

答案 0 :(得分:4)

检查由clang_parseTranslationUnit()生成的诊断 - 即使遇到错误,也会生成AST,但显然不能保证它有意义。

我发现注释#include行会导致编译错误,但会生成类似于您的AST(特别是第17行丢失)。

使用#includesize_t的typedef替换ssize_t行(作为int)会导致有关write()的隐式声明的编译警告,但是AST包括第17行。

因此我假设您的头文件存在问题,诊断应该显示。 例如,可以检索诊断。

for (unsigned I = 0, N = clang_getNumDiagnostics(TU); I != N; ++I) { 
    CXDiagnostic Diag = clang_getDiagnostic(TU, I);
    CXString String = clang_formatDiagnostic(Diag, clang_defaultDiagnosticDisplayOptions());
    fprintf(stderr, "%s\n", clang_getCString(String));
    clang_disposeString(String);
}

答案 1 :(得分:0)

我正在使用libclang来解析和优化c代码,但是使用你的代码解析c源文件我看不到CompoundStmt中的CXCursor_BinaryOperator

例如

void OCTS_C_TimerMiliseconds_reset_Timers(OCTS_outC_C_TimerMiliseconds_Timers *outC)
{
    outC->init = kcg_true;
    /* 1 */ OCTS_Sign_INT_reset_Math(&outC->_1_Context_1);
    /* 2 */ OCTS_Sign_INT_reset_Math(&outC->Context_2);
    /* 1 */ OCTS_FallingEdge_reset_Edge(&outC->Context_1);
} 

结果是:

FunctionDecl at s.cpp (10, 1) to (17, 2) OCTS_C_TimerMiliseconds_reset_Timers 
ParmDecl at s.cpp (11, 3) to (11, 44) outC 
 TypeRef at s.cpp (11, 3) to (11, 38) OCTS_outC_C_TimerMiliseconds_Timers 
CompoundStmt at s.cpp (12, 1) to (17, 2)