srcmlcpp: C++ code parsing#
litgen provides three separate python packages, srcmlcpp is one of them:
codemanip
: a python package to perform textual manipulations on C++ and Python code. See code_utils.pysrcmlcpp
: a python package that build on top of srcML in order to interpret the XML tree produced by srcML as a tree of python object resembling a C++ AST.litgen
: a python package that generates python bindings from C++ code.
srcmlcpp
will transform C++ source into a tree of Python objects (descendants of CppElement
) that reflect the C++ AST.
This tree is used by litgen to generate the python bindings. It may also be used to perform automatic C++ code transformations.
Transform C++ code into a CppElement tree#
Given the C++ code below:
code = """
// A Demo struct
struct Foo
{
const int answer(int *v=nullptr); // Returns the answer
};
"""
srcmlcpp can produce a tree of CppElement
with this call:
import srcmlcpp
options = srcmlcpp.SrcmlcppOptions()
cpp_unit = srcmlcpp.code_to_cpp_unit(options, code)
cpp_unit
is then a tree of Python object (descendants of CppElement
) that represents the source code.
Here is what it looks like under a debugger:
Transform a CppElement tree into C++ code#
Transformation to source code from a tree of CppElement
#
CppElement
provides a method str_code()
that can output the C++ code it contains. It is close to the original source code (including comments), but can differ a bit.
Note
Any modification applied to the AST tree by modifying the CppElements objects (CppUnit, CppStruct, etc.) will be visible using this method
from litgen.demo import litgen_demo
litgen_demo.show_cpp_code(cpp_unit.str_code())
// A Demo struct
struct Foo
{
public: // <default_access_type/>
const int answer(int * v = nullptr); // Returns the answer
};
“Verbatim” transformation from tree to code#
You can obtain the verbatim source code (i.e. the exact same source code that generated the tree), with a call to str_code_verbatim()
.
Note
This will call the srcML executable using the srcml xml tree stored inside
cpp_unit.srcml_xml
, which guarantees to return the same source codeAny modification applied to the AST tree by modifying the
CppElement
python objects (CppUnit, CppStruct, etc.) will not be visible using this method
print(cpp_unit.str_code_verbatim())
// A Demo struct
struct Foo
{
const int answer(int *v=nullptr); // Returns the answer
};
CppElement types#
When parsing C++ code, it will be represented by many python objects, that represents differents C++ tokens.
See the diagram below for more information:
litgen and srcmlcpp#
For information, when litgen transform C++ code into python bindings, it will transform the CppElement
tree into a tree of AdaptedElement
.
See diagram below: