PyCLang Notes

References

Some Examples / Playing

Here are some examples of playing around with pyclang...

I have version 6.0.0.2, installed using pip install clang (clang install seperately).

Poo, at the moment, I cant see a way of getting the opcode of a binary operator using these bindings. There appears to be an accepted patch for this functionality, but its been hanging around for over 4 years at the time of writing... so err... not holding my breath.

PyBee seems to have added this functionality in their fork called Sealang, which they say is an improved set of Python bindings for libclang, but unfortunately this project is no longer maintained. I tried testing it. Although it installed the module could not be imported due a missing symbol - I'm guessing its too out of date to work with the later libclang verions :'(

The Translation Unit - clang.cindex.TranslationUnit

The TranslationUnit seems to have the following useful properties:

The Cursor Abstraction - clang.cindex.Cursor

The cursor abstraction unifies the different kinds of entities in a program - declaration, statements, expressions, references to declarations, etc. - under a single &auot;cursor" abstraction with a common set of operations. Common operation for a cursor include: getting the physical location in a source file where the cursor points, getting the name associated with a cursor, and retrieving cursors for any child nodes of a particular cursor. [ref].

The cursor functions that are useful for navigating the AST are get_children(), lixical_parent(), sematic_parent and walk_preorder(). From the clang docs:

The lexical parent of a cursor is the cursor in which the given cursor was actually written. For many declarations, the lexical and semantic parents are equivalent (the semantic parent is returned by clang_getCursorSemanticParent()). They diverge when declarations or definitions are provided out-of-line. For example:

class C {
 void f();
};
void C::f() { }

In the out-of-line definition of C::f, the semantic parent is the class C, of which this function is a member. The lexical parent is the place where the declaration actually occurs in the source code; in this case, the definition occurs in the translation unit. In general, the lexical parent for a given entity can change without affecting the semantics of the program, and the lexical parent of different declarations of the same entity may be different. Changing the semantic parent of a declaration, on the other hand, can have a major impact on semantics, and redeclarations of a particular entity should all have the same semantic context.

In the example above, both declarations of C::f have C as their semantic context, while the lexical context of the first C::f is C and the lexical context of the second C::f is the translation unit.

The cursor abstraction has the following properties/functions of interest, some of which wrap up the C cursor manipulator functions [ref]:

Finding Enums

I wanted to find enums, whether they were anonymous or named, and for both cases if they were hidden behind a typedef. I was only interested in globally defined enums, not enums embedded in structs or local to functions, but I've included some examples here.

  1. An anonymous enum:
    // 1. Anonymous enum
    enum { ANON_ENUM_1, ... };
    • cursor.spelling = ""
    • cursor.type.spelling = "name enum (anonymous)"
    • cursor.is_anonymous() = True
    • The AST tree representing this is:
      +-- NODE: CursorKind.ENUM_DECL spel = '' (len=0)
          |   : cur.type.spelling: enum (anonymous at test_files/test1.c:1:1)
          |   : cur.type.kind: TypeKind.ENUM
          |   : cur.is_anonymous: True
          |   : cur.lexical_parent.spelling: test_files/test1.c
          |   : cur.semantic_parent.spelling: test_files/test1.c
          |   : cur.enum_type.spelling: int
          +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'ANON_ENUM_1' (len=11)
          |       : cur.type.spelling: int
          |       : cur.enum_value: 0
          |       : cur.semantic_parent.type.spelling: (anonymous at test_files/test1.c:1:1)
          |       : cur.semantic_parent.kind.spelling: CursorKind.ENUM_DECL
          +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'ANON_ENUM_2' (len=11)
          ...
          ...
          ...
  2. A named enum called bare_named.
    // 2. Named enum
    enum Bare_Named_Enum { BARE_NAMED_ENUM_1, ... };
    • There is only one enum decl.
    • cursor.spelling = "bare_named"
    • cursor.type.spelling = "enum bare_named"
    • cursor.is_anonymous() = False
    • The AST tree respresenting this:
      +-- NODE: CursorKind.ENUM_DECL spel = 'Bare_Named_Enum' (len=15)
          |   : cur.type.spelling: enum Bare_Named_Enum
          |   : cur.type.kind: TypeKind.ENUM
          |   : cur.is_anonymous: False
          |   : cur.lexical_parent.spelling: test_files/test1.c test_files/test1.c
          |   : cur.enum_type.spelling: int
          +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'BARE_NAMED_ENUM_1' (len=17)
          |       : cur.type.spelling: int
          |       : cur.enum_value: 0
          |       : cur.semantic_parent.type.spelling: enum Bare_Named_Enum 
          |       : cur.semantic_parent.kind.spelling: CursorKind.ENUM_DECL
          +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'BARE_NAMED_ENUM_2' (len=17)
          ...
          ...
          ...
  3. A typedef'ed anonymouse enum.
    // 3. Typdef'd anonymouse enum
    typedef enum { TYPEDEF_ANON_ENUM_1, ... } Typedef_Anonymouse_Enum_t;
    • cursor.spelling = ""
    • cursor.type.spelling = "type_t"
    • cursor.is_anonymous() = False. Presumably because it is referenced by the type created.
    • AST:
      +-- NODE: CursorKind.TYPEDEF_DECL spel = 'Typedef_Anonymouse_Enum_t' (len=25)
          |       : cur.type.spelling: Typedef_Anonymouse_Enum_t
          |       : cur.spelling: Typedef_Anonymouse_Enum_t
          |       : cur.underlying_typedef_type.spelling: enum Typedef_Anonymouse_Enum_t
          +-- NODE: CursorKind.ENUM_DECL spel = '' (len=0)
              |   : cur.type.spelling: Typedef_Anonymouse_Enum_t
              |   : cur.type.kind: TypeKind.ENUM
              |   : cur.is_anonymous(): False
              |   : cur.lexical_parent.type.spelling: test_files/test1.c
              |   : cur.semantic_parent.type.spelling: test_files/test1.c
              |   : cur.enum_type.spelling: int
              +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'TYPEDEF_ANON_ENUM_1' (len=19)
              |       : cur.type.spelling: int
              |       : cur.enum_value: 0
              |       : cur.lexical_parent.type.spelling: enum MySecondTestEnum
              |       : cur.semantic_parent.kind.spelling: CursorKind.ENUM_DECL
              +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'TYPEDEF_NAMED_ENUM_2' (len=20)
              ...
              ...
  4. A typedef'ed enum with a name.
    // 4. Typdef'd named enum
    typedef enum Typdef_Named_enum { TYPEDEF_NAMED_ENUM_1, ... } Typedef_Named_Enum_t;
    • There are two enum decls - one for the enum alone, and one as a child of the typedef.
    • cursor.spelling = "named_and_typedefed"
    • cursor.type.spelling = "enum named_and_typedefed"
    • cursor.is_anonymous() = False
    • AST:
      +-- NODE: CursorKind.TYPEDEF_DECL spel = 'Typedef_Named_Enum_t' (len=20)
          |   : cur.type.spelling: Typedef_Named_Enum_t
          |   : cur.spelling: Typedef_Named_Enum_t
          |   : cur.underlying_typedef_type.spelling: enum Typdef_Named_enum
          +-- NODE:  CursorKind.ENUM_DECL spel = 'Typdef_Named_enum' (len=17)
              |       : cur.type.spelling: enum Typdef_Named_enum
              |       : cur.kind.spelling: TypeKind.ENUM
              |       : cur.is_anonymous(): False
              |       : cur.lexical_parent.spelling test_files/test1.c
              |       : cur.sementic_parent.spelling test_files/test1.c
              |       : cur.enum_type.spelling: int
              +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'TYPEDEF_NAMED_ENUM_1' (len=20)
              |           : cur.type.spelling: int
              |           : cur.enum_value: 0
              |           : cur.lexical_parent.type.spelling: enum Typdef_Named_enum 
              |           : cur.semantic_parent.type.spelling: enum Typdef_Named_enum
              |           : cur.semantic_parent.kind.spelling: CursorKind.ENUM_DECL
              +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'TYPEDEF_NAMED_ENUM_2' (len=20)
              ...
              ...
           ...
  5. A named enum declared inside a structure.
    struct thestruct {
       enum enum_in_struct {
          ENUM_IN_STRUCT_1, ENUM_IN_STRUCT_2
       } val;
    };
    • The AST looks like this:
      +-- NODE:  CursorKind.STRUCT_DECL spel = 'thestruct' (len=9)
          +-- NODE:  CursorKind.ENUM_DECL spel = 'enum_in_struct' (len=14)
          |   |   : cur.type.spelling: enum enum_in_struct
          |   |   : cur.type.kind: TypeKind.ENUM
          |   |   : cur.is_anonymous(): False
          |   |   : cur.lexical_parent.spelling: thestruct test_files/test1.c
          |   |   : cur.semantic_parent.spelling: thestruct test_files/test1.c
          |   |   : cur.enum_type.spelling:  int
          |   +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'ENUM_IN_STRUCT_1' (len=16)
          |   |       : cur.type.spelling: int
          |   |       : cur.enum_value: 0
          |   |       : cur.lexical_parent.type.spelling: enum enum_in_struct 
          |   |       : cur.semantic_parent.type.spelling: enum enum_in_struct
          |   |       : cur.semantic_parent.kind: CursorKind.ENUM_DECL
          |   +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'ENUM_IN_STRUCT_2' (len=16)
          |           ...
          +-- NODE:  CursorKind.FIELD_DECL spel = 'val' (len=3)
              +-- NODE:  CursorKind.ENUM_DECL spel = 'enum_in_struct' (len=14)
                  |   : cur.type.spelling: enum enum_in_struct
                  |   : cur.type.kind: TypeKind.ENUM
                  |   : cur.is_anonymous(): False
                  |   : cur.lexical_parent.spelling: thestruct test_files/test1.c
                  |   : cur.semantic_parent.spelling: thestruct test_files/test1.c
                  |   : cur.enum_type.spelling:  int
                  +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'ENUM_IN_STRUCT_1' (len=16)
                  |       : cur.type.spelling: int
                  |       : cur.enum_value: 0
                  |       : parents: enum enum_in_struct enum enum_in_struct CursorKind.ENUM_DECL
                  +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'ENUM_IN_STRUCT_2' (len=16)
                          : cur.type.spelling: int
                          : cur.enum_value: 1
                          : parents: enum enum_in_struct enum enum_in_struct CursorKind.ENUM_DECL
  6. A typedef'd enum declared in a function:
    +-- NODE:  CursorKind.FUNCTION_DECL spel = 'func' (len=4)
        +-- NODE:  CursorKind.COMPOUND_STMT spel = '' (len=0)
            +-- NODE:  CursorKind.DECL_STMT spel = '' (len=0)
                +-- NODE:  CursorKind.ENUM_DECL spel = 'enum_in_func' (len=12)
                |   +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'E_IN_FUNC_1' (len=11)
                |   +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'E_IN_FUNC_2' (len=11)
                +-- NODE:  CursorKind.TYPEDEF_DECL spel = 'Enum_In_Func_t' (len=14)
                    +-- NODE:  CursorKind.ENUM_DECL spel = 'enum_in_func' (len=12)
                        +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'E_IN_FUNC_1' (len=11)
                        +-- NODE:  CursorKind.ENUM_CONSTANT_DECL spel = 'E_IN_FUNC_2' (len=11)

To get the enums:

Functions

All functions are represented in the AST using CursorKind.FUNCTION_DECL nodes. To differentiate between declarations and definitions, the cursor function is_definition() is used.

To go from the declaration to the definition the cursor function get_definition() can be used.

When a function is called, it is represented in the AST using a CursorKind.CALL_EXPR node.

typedef int NewType_t;

long func_with_params(char a, short b, NewType_t c)
{
   return a * b * c;
}
         
+-- NODE:  CursorKind.FUNCTION_DECL spel = 'func_with_params' (len=16)
    |   : cur.is_definition() True
    |   : cur.linkage: LinkageKind.EXTERNAL
    |   : cur.result_type.spelling: long
    |   : cur.get_arguments().type.spelling: ['char', 'short', 'NewType_t']
    +-- NODE:  CursorKind.PARM_DECL spel = 'a' (len=1)
    |       : cur.type.spelling: char
    +-- NODE:  CursorKind.PARM_DECL spel = 'b' (len=1)
    |       : cur.type.spelling: short
    +-- NODE:  CursorKind.PARM_DECL spel = 'c' (len=1)
    |       : cur.type.spelling: NewType_t
    +-- NODE:  CursorKind.COMPOUND_STMT spel = '' (len=0)
        +-- NODE:  CursorKind.RETURN_STMT spel = '' (len=0)
            +-- NODE:  CursorKind.UNEXPOSED_EXPR spel = '' (len=0)
                +-- NODE:  CursorKind.BINARY_OPERATOR spel = '' (len=0)
                    |   : tokens: ['a', '*', 'b', '*', 'c']
                    +-- NODE:  CursorKind.BINARY_OPERATOR spel = '' (len=0)
                    |   |   : tokens: ['a', '*', 'b']
                    |   +-- NODE:  CursorKind.UNEXPOSED_EXPR spel = 'a' (len=1)
                    |   |   +-- NODE:  CursorKind.UNEXPOSED_EXPR spel = 'a' (len=1)
                    |   |       +-- NODE:  CursorKind.DECL_REF_EXPR spel = 'a' (len=1)
                    |   |               : type char
                    |   |               : referenced type char
                    |   +-- NODE:  CursorKind.UNEXPOSED_EXPR spel = 'b' (len=1)
                    |       +-- NODE:  CursorKind.UNEXPOSED_EXPR spel = 'b' (len=1)
                    |           +-- NODE:  CursorKind.DECL_REF_EXPR spel = 'b' (len=1)
                    |                   : type short
                    |                   : referenced type short
                    +-- NODE:  CursorKind.UNEXPOSED_EXPR spel = 'c' (len=1)
                        +-- NODE:  CursorKind.DECL_REF_EXPR spel = 'c' (len=1)
                                : type NewType_t
                                : referenced type NewType_t
         

void call_func_with_params(void)
{
   long a;
   a = func_with_params('c', 10, 100);
}
         
+-- NODE:  CursorKind.FUNCTION_DECL spel = 'call_func_with_params' (len=21)
    |   : cur.is_definition() True
    |   : cur.get_definition().is_definition() True
    |   : cur.linkage: LinkageKind.EXTERNAL
    |   : cur.result_type.spelling: void
    +-- NODE:  CursorKind.COMPOUND_STMT spel = '' (len=0)
        +-- NODE:  CursorKind.DECL_STMT spel = '' (len=0)
        |   +-- NODE:  CursorKind.VAR_DECL spel = 'a' (len=1)
        +-- NODE:  CursorKind.BINARY_OPERATOR spel = '' (len=0)
            |   : tokens: ['a', '=', 'func_with_params', '(', "'c'", ',', '10', ',', '100', ')']
            +-- NODE:  CursorKind.DECL_REF_EXPR spel = 'a' (len=1)
            |       : type long
            |       : referenced type long
            +-- NODE:  CursorKind.CALL_EXPR spel = 'func_with_params' (len=16)
                |   : cur.type.spelling: long
                |   : cur.get_arguments().type.spelling: ['char', 'short', 'int']
                +-- NODE:  CursorKind.UNEXPOSED_EXPR spel = 'func_with_params' (len=16)
                |   +-- NODE:  CursorKind.DECL_REF_EXPR spel = 'func_with_params' (len=16)
                |           : type long (char, short, int)
                |           : referenced type long (char, short, int)
                +-- NODE:  CursorKind.UNEXPOSED_EXPR spel = '' (len=0)
                |   +-- NODE:  CursorKind.CHARACTER_LITERAL spel = '' (len=0)
                +-- NODE:  CursorKind.UNEXPOSED_EXPR spel = '' (len=0)
                |   +-- NODE:  CursorKind.INTEGER_LITERAL spel = '' (len=0)
                |           : tokens: ['10']
                +-- NODE:  CursorKind.INTEGER_LITERAL spel = '' (len=0)
                        : tokens: ['100']
         

void use_a_function_pointer(void)
{
   long (*ptr)(char a, short b, int c);

   ptr = &func_with_params;

   struct
   {
      void(*ptr)(char a, short b, int c);
   } s;

   s.ptr = &func_with_params;

   ptr(1, 2, 3);
   s.ptr(11, 12, 13);
}