srcmlcpp: C++ code parsing#

litgen provides three separate python packages, srcmlcpp is one of them:

  • codemanip: a python package to perform textual manipulations on C++ and Python code. See code_utils.py

  • srcmlcpp: a python package that build on top of srcML in order to interpret the XML tree produced by srcML as a tree of python object resembling a C++ AST.

  • litgen: a python package that generates python bindings from C++ code.

srcmlcpp will transform C++ source into a tree of Python objects (descendants of CppElement) that reflect the C++ AST.

This tree is used by litgen to generate the python bindings. It may also be used to perform automatic C++ code transformations.

Transform C++ code into a CppElement tree#

Given the C++ code below:

code = """
// A Demo struct
struct Foo
{
    const int answer(int *v=nullptr); // Returns the answer
};
"""

srcmlcpp can produce a tree of CppElement with this call:

import srcmlcpp

options = srcmlcpp.SrcmlcppOptions()
cpp_unit = srcmlcpp.code_to_cpp_unit(options, code)

cpp_unit is then a tree of Python object (descendants of CppElement) that represents the source code.

Here is what it looks like under a debugger: tree

Transform a CppElement tree into C++ code#

Transformation to source code from a tree of CppElement#

CppElement provides a method str_code() that can output the C++ code it contains. It is close to the original source code (including comments), but can differ a bit.

Note

Any modification applied to the AST tree by modifying the CppElements objects (CppUnit, CppStruct, etc.) will be visible using this method

from litgen.demo import litgen_demo

litgen_demo.show_cpp_code(cpp_unit.str_code())
// A Demo struct
struct Foo
{
public: // <default_access_type/>
    const int answer(int * v = nullptr); // Returns the answer
};

“Verbatim” transformation from tree to code#

You can obtain the verbatim source code (i.e. the exact same source code that generated the tree), with a call to str_code_verbatim().

Note

  • This will call the srcML executable using the srcml xml tree stored inside cpp_unit.srcml_xml, which guarantees to return the same source code

  • Any modification applied to the AST tree by modifying the CppElement python objects (CppUnit, CppStruct, etc.) will not be visible using this method

print(cpp_unit.str_code_verbatim())
// A Demo struct
struct Foo
{
    const int answer(int *v=nullptr); // Returns the answer
};

CppElement types#

When parsing C++ code, it will be represented by many python objects, that represents differents C++ tokens.

See the diagram below for more information:

srcmlcpp_diagram

litgen and srcmlcpp#

For information, when litgen transform C++ code into python bindings, it will transform the CppElement tree into a tree of AdaptedElement.

See diagram below:

litgen_diagram