{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# srcmlcpp: C++ code parsing\n", "\n", "litgen provides three separate python packages, srcmlcpp is one of them:\n", "\n", "* [`codemanip`](https://github.com/pthom/litgen/tree/main/packages/codemanip): a python package to perform _textual_ manipulations on C++ and Python code. See [code_utils.py](https://github.com/pthom/litgen/tree/main/packages/codemanip/code_utils.py)\n", "* [`srcmlcpp`](https://github.com/pthom/litgen/tree/main/packages/srcmlcpp): a python package that build on top of srcML in order to interpret the XML tree produced by srcML as a tree of python object resembling a C++ AST.\n", "* [`litgen`](https://github.com/pthom/litgen/tree/main/packages/litgen): a python package that generates python bindings from C++ code.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "[`srcmlcpp`](https://github.com/pthom/litgen/tree/main/packages/srcmlcpp) will transform C++ source into a tree of Python objects (descendants of `CppElement`) that reflect the C++ AST.\n", "\n", "This tree is used by litgen to generate the python bindings. It may also be used to perform automatic C++ code transformations.\n", "\n", "## Transform C++ code into a CppElement tree\n", "Given the C++ code below: " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "code = \"\"\"\n", "// A Demo struct\n", "struct Foo\n", "{\n", " const int answer(int *v=nullptr); // Returns the answer\n", "};\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "srcmlcpp can produce a tree of `CppElement` with this call:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import srcmlcpp\n", "\n", "options = srcmlcpp.SrcmlcppOptions()\n", "cpp_unit = srcmlcpp.code_to_cpp_unit(options, code)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`cpp_unit` is then a tree of Python object (descendants of `CppElement`) that represents the source code.\n", "\n", "Here is what it looks like under a debugger:\n", "![tree](images/srcml_cpp_doc.png)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transform a CppElement tree into C++ code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transformation to source code from a tree of `CppElement`\n", "\n", "`CppElement` provides a method `str_code()` that can output the C++ code it contains. It is close to the original source code (including comments), but can differ a bit.\n", "\n", "```{note}\n", "Any modification applied to the AST tree by modifying the CppElements objects (CppUnit, CppStruct, etc.) will be visible using this method\n", "```" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", " \n", "
\n", "
\n", " \n", "
\n", "
// A Demo struct\n",
       "struct Foo\n",
       "{\n",
       "public: // <default_access_type/>\n",
       "    const int answer(int * v = nullptr); // Returns the answer\n",
       "};\n",
       "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from litgen.demo import litgen_demo\n", "\n", "litgen_demo.show_cpp_code(cpp_unit.str_code())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \"Verbatim\" transformation from tree to code\n", "\n", "You can obtain the verbatim source code (i.e. the exact same source code that generated the tree), with a call to `str_code_verbatim()`. \n", "\n", "```{note}\n", "* This will call the srcML executable using the srcml xml tree stored inside `cpp_unit.srcml_xml`, which guarantees to return the same source code\n", "* Any modification applied to the AST tree by modifying the `CppElement` python objects (CppUnit, CppStruct, etc.) will not be visible using this method\n", "```" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "// A Demo struct\n", "struct Foo\n", "{\n", " const int answer(int *v=nullptr); // Returns the answer\n", "};\n", "\n" ] } ], "source": [ "print(cpp_unit.str_code_verbatim())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## CppElement types\n", "\n", "When parsing C++ code, it will be represented by many python objects, that represents differents C++ tokens.\n", "\n", "See the diagram below for more information:\n", "\n", "![srcmlcpp_diagram](images/srcmlcpp_diagram.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## litgen and srcmlcpp\n", "\n", "For information, when litgen transform C++ code into python bindings, it will transform the `CppElement` tree into a tree of `AdaptedElement`. \n", "\n", "See diagram below:\n", "\n", "![litgen_diagram](images/litgen_diagram.png)" ] } ], "metadata": { "kernelspec": { "display_name": "venv311", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 2 }