## Purpose

### If you have something like this:

 TYPE modell_CommentsBehind__function(TYPE Parameter) // this is the modell of a function // with comments behind of the commands
 { TYPE   ReturnValue;                         //comment ReturnValue TYPE_A LokalVariable_1 = Value1;            //variable-comment 1 TYPE_A LokalVariable_2;                     //variable-comment 2 TYPE_Z LokalVariable_N = ValueN;            //variable-comment N operation_1( & LokalVariable_1,Parameter);  //operation-comment 1 LokalVariable_2 = operation_2(LokalVariable_N); //operation-comment 2 if(LokalVariable_1 == LokalVariable_2) //statement-comment for true { ReturnValue = DefaultValue;                //operation-comment for true } else                                        //statement-comment for false { while(ParameterStatment)                   //loop-comment ReturnValue = operation_3();              //operation-comment 3 }; return(ReturnValue);                        //comment return-line }

### MuLanPa will generate something like this:

  TYPE modell_CommentsBehind__function ( TYPE Parameter ) // this is the modell of a function // with comments behind of the commands { TYPE ReturnValue ; //comment ReturnValue TYPE_A LokalVariable_1 = Value1 ; //variable-comment 1 TYPE_A LokalVariable_2 ; //variable-comment 2 TYPE_Z LokalVariable_N = ValueN ; //variable-comment N operation_1 ( & LokalVariable_1 , Parameter ) ; //operation-comment 1 LokalVariable_2 = operation_2 ( LokalVariable_N ) ; //operation-comment 2 if ( LokalVariable_1 == LokalVariable_2 ) //statement-comment for true { ReturnValue = DefaultValue ; //operation-comment for true } else //statement-comment for false { while ( ParameterStatment ) //loop-comment ReturnValue = operation_3 ( ) ; //operation-comment 3 } ; return ( ReturnValue ) ; //comment return-line } 

## Introduction

 MuLanPa stands for Multi-Language-Parser and is the name of the project. But the binary was renamed into abc2xml to make clearer what it does. The name of the tool stands for the conversion of a text written in a language (abc) with a defined syntax into (2) xml. The binary abc2xml is designed as a source-code analysing program that generates xml-files wich represent the algorithm and data-structure of the source. It has an own source parser system that is configured by external grammar-description. Thus it may be used for several programming-languages. Additional configurations of abc2xml are placed in an xml-file. The output of abc2xml should be used as input for tools like Moritz a structogram-generator for Doxygen. But it may also be used as data-base for other tools like project-browsers for code-editors or other code-structure viewers. MuLanPa comes along with a second bianry xml2abc that will be used for documentation-purpose only. Both binaries are console- or terminal-applications which have to be started via command-line. With configuration-files you may control the general style of the output.

abc2xml is designed as one tool in a chain of tools. It may be used as stand-alone application also but this is not the native use-case. abc2xml itself has an own source-parser and creates xml- files but no diagrams or other grafics. Thus the ouput of abc2xml should be post-processed by an other tool.

To parse the sources written in a programming-language or a special script-language abc2xml neads the description of this language in form of grammar-file. abc2xml reads this grammar first to learn how to analyse the sources or scripts. This grammar itself has to be offerd in a special notation as a text-file or as part of the base-configuration. The native destination-tool is the binary of MuLanPa called xml2abc that reads the xml-files and creates several script-files to describe the diagrams as base for a graphical output via a script-interpreting tool.
The output-format of abc2xml is xml. So a common html-browser may be used to view its content. But since this is not a very comfortable solution it will be better to use an additional tool that is able to interprete the content of the abc2xml-output and/or that generates an output that shows the user what he realy wants to see.

To control the whole process of generating the xml-scripts with abc2xml several files are used

• First of all the user starts a batch-file or terminal-script wich calls abc2xml and perhaps additional tools to interprete the output of abc2xml sequentially.
• abc2xml needs a configuration file in xml-format. The work of abc2xml will be done in a sequence of processes wich can be configured to adapt them for other languages.

At he moment following processes are used:  Preprocessor Directives to solve compilerswitches in languages like C/C++ Context to split the source-text in comments and active code Comment to analyse the comments if they contain special commands Line to analyse line changes and cut out signes of logical line-connections Source to parse the prepared source-text Merge to create one xml-file that contains the result of all other processes
The grammar-text to teach abc2xml the language of the sources are analiesd while a special process that knows the notation of the grammar, that builds the process-parser and that is able to create a special terminal-output as information for the user. Every process has is own parser that has to be defined in form of gramar-texts. This text a writen in a notation that can be read by the notaion-process.In the future more than one notation may be used.
• It is possible to split the configurations of the tool in a user-file and some more detailed configurations.
In the distribution you will find the folder cfg where the configurations for the comon user are placed. For example you place here the information about the files you wish to analyse
The folder LangPack contains for every supported programming language an own sub-folder used as location for the detailed configuration. In parallel to the detailed xml-configurations you will find in addition some other files. The a2x-files contain the grammar descripition of the programming-language and the x2a-files the script-snipets used to assemble the syntax-diagrams for documentation purpose .

## Controlling MuLanPa via Shell- or Batch-Scripts

This is a possible algorithm for the controlling terminal-script
This bash-script shows how to use MuLanPa together with the program doxygen (www.doxygen.org).
In this file only command-line parameters are defined. Other important adjustments are made in the configuration-files.
voidscrHTMim ( void)
parameter-settings
 XMLPATH = "./xml/" path of the xml-files created by abc2xmc DESTINATIONPATH_DOT = "./dot/" path for the syntax-diagrams created by xml2abc CONFIGURATION_XML = "./cfg/abc2xml_cfg.xml" configuration of abc2xml to transfer the source-files to xml-files CONFIGURATION_DOT = "./cfg/grm2abc_cfg.xml" configuration of xml2abc to transfer the used grammar-rules into dot-based syntax-diagrams MULANPAPATH = "./bin/" location of MuLanPa
delete old outputs of doxygen and moritz
 removeDirectoryContent ( XMLPATH, "*.xml") old output of abc2xml removeDirectoryContent ( DESTINATIONPATH_DOT, "*.dt") old output of xml2abc
doxygen and moritz in action
 ( MORITZPATH ) abc2xml CFCONFIGURATION_XML run abc2xml to generate xml-files wich contain the algorithm-structure ( MORITZPATH ) xml2abc CFCONFIGURATION_DOT run xml2abc to generate dot-based syntax diagrams doxygen ( "./doxygen/Doxyfile_html") run doxygen to generate a documentation that contains the syntax-diagrams of the used programming-language
(note, this diagram contains no valid script-text, it should only describe the sequence-steps)

This is is a possible content of the controlling terminal-script:
 rem **************************************************************************** rem * Example-batch to demonstrate how to use mulanpa rem * rem * This batch-file shows how to use moritz together with the program doxygen rem * (www.doxygen.org) to create dot-based syntax diagrams. rem * rem * In this file only command-line parameters are defined. Other important rem * adjustments are made in the configuration-files. They may be also rem * responsible for problems. rem * Use the commented pause-commands if you are looking for sources of rem * trouble. rem **************************************************************************** rem **************************************************************************** rem parameter-settings rem **************************************************************************** rem path of the xml-sources generated by abc2xml set DESTINATION_XML=.\xml\ rem Pause rem path for the nassi uml activity diagrams generated by xml2abc set DESTINATION_DOT=.\dot\ rem Pause rem abc2xml-configuration to generate xml files set CONFIGURATION_XML=.\cfg\abc2xml_cfg.xml rem Pause rem xml2abc-configuration to generate syntax diagrams set CONFIGURATION_DOT=.\cfg\grm2abc_cfg.xml rem Pause rem location of MuLanPa set MULANPAPATH=.\bin\ rem Pause rem **************************************************************************** rem delete old outputs of doxygen and mulanpa rem **************************************************************************** rem old ouput-sources of abc2xml del %DESTINATION_XML%*.xml rem Pause rem outputs of xml2abc del %DESTINATION_DOT%*.dt del %DESTINATION_DOT%*.html rem Pause rem **************************************************************************** rem doxygen and moritz in action rem **************************************************************************** rem run abc2xml to generate transfer the souce-files into xml-files %MORITZPATH%abc2xml CF%CONFIGURATION_XML% rem >>log.txt rem Pause rem run moritz to generate files wich contain dotbased syntax diagrams %MORITZPATH%xml2abc CF%CONFIGURATION_DOT% rem >>log.txt rem Pause rem run doxygen to generate a documentation that contains the syntax-diagrams rem of the used programming-language doxygen.exe .\doxygen .\cfg\Doxyfile_html rem pause 

## How It Works

Parsing a source written in a programming language like C/C++ is not rella trivial. Furthermore it is one goal of MuLanPa to support differnet programming languages. Thus the following section is only a rough overview. More details can be found in the documentation you can download from the Sourceforge Projekt

### Grammar

Parsing the sources or scripts is one of the basic steps for abc2xml to convert the input into the output. But this process it self depends on the language of the input-text. Therefore it is necessary to configure abc2xml by defining the grammar of the source- or script-language. This grammar itself has to be defined in a special file with the attachment .a2x or as part of the xml-configuration writen in a special notation that abc2xml knows. At the moment there is only one kind of notation that can be used. It is based on the Spirit parser-library that is used to implement the parsing-process. It is planed to implement other notations also like ebnf or regex.
This is an example to describe the construction of names:

  /* Spirit 1.8.5 Grammar-Example */ ENDMARKER = "ENDMARKER"; INDENT = "INDENT"; DEDENT = "DEDENT"; NEWLINE = "NEWLINE"; NON_NAME = ENDMARKER | INDENT | DEDENT | NEWLINE | KEYWORD; KEYWORD = "and" | "del" | "from" | "not" | "while" | "as" | "elif" | "global" | "or" | "with" | "assert" | "else" | "if" | "pass" | "yield" | "break" | "except" | "import" | "print" | "class" | "exec" | "in" | "raise" | "continue" | "finally" | "is" | "return" | "def" | "for" | "lambda" | "try"; NAME = ( (range_p('a','z') | range_p('A','Z') | '_') >> *(range_p('a','z') | range_p('A','Z') | range_p('0','9') | '_') ) - NON_NAME; 

The current implentation of the grammar knows several basic parsers and operators to describe the structure of a non-cotexts-sensitive language. Every combination of basic parsers and operators is also a parser. This combined parser can be used as a sub-block in a more complex combination that describes a parser also or it can define as a parser-rule a complete new parser. Every parsers defined in a parser-rule has a name or identifier, a string-literal that represents this parser as element in other parser-rules.

### Scanner and Parser

#### 1. Scanning the Text to parse for Tokens

A token is the base-element of a language. It may be single character or a sequence of characters. In some languages some special properties of text-parts are also defined as token, for example the indention or dedention of line. But since abc2xml uses special processes to insert special strings for this non textual tokens a parser of abc2xml has not to deal with non textual tokens.

Basic token-parsers take a look to every character and compares it to its individual search-pattern. If the current character is fits to the search-pattern it will be noticed by the parser. If the current character is not permited by a search-pattern the coresponding parser drops its current part-result. If the current character is the last part of the token describen by the search-pattern and the parser has now the complete token, this is a so caled parser-hit and the found token is now an input for a higher leveld expression-parser. Its a little bit like playing bingo. The scanner calls out the content of the text to analyse character for character. If a token-parsers finds the character on its rule-card as allowed it will be checked. But if the current character is forbidden the token-parser will be excluded (what is not the case if you are playing bingo). If one token is found the next token will be searched in the same manner and so the scanner and the token-parses together transform a sequence of characters into a sequence of tokens.

#### 2. Combine Tokens to single Expressions

An expression (in the sence of this chapter of the documentation) is an language-element that contains a token or a combination of tokens.

Every expression-parser is constructed as combination of token-parsers where one token-parser may be used for several expression-parsers. At the end the parsing of expressions works simmilar to the parsimg of tokens. As long as the current token fits the search-pattern of the expression-parser the parsing goes on until the last token is reached or a forbidden token stops the work of the parser. If a parser has a hit its result may be the input for a more complex expression that will be searched by an other parser.

#### 3. Create the Parser-Output from the Expressions

As result of succesfull search will be stored as parser-tree that reflects the structure of the used expresions, sub-expressions and tokens. Every parser uses this tree-stucture to store each single result and give this to its reciving parsers which ad this as part-result to their own parser-tree if it fits to the search-pattern.

### Directive Process

 Especially C and C++ sources contain not only parts written in the programming language itself but also parts in a different language the preprocessor-directives. Simple preprocessor-commands can be treated like norcmal c/c++ commands thus the used c/c++ source-parser contains a grammar for the prerprocessor-directives also. But it is allways possible that compiler-switches contain source-snippets wich are not able to be parsed since starting and/or ending parts are not part of the snipet. In this cases a special process has to be used to construct out of the original-source a special one where compiler-switches with broken source-conmtent are solved thus that the new source contains valide code only. Since not every programming language knows preprocessor-dircetives wich may contain broken code-parts this process has to be activated by using special configuration-parts. Currently this is only possible for C and C++ . The directive-process has an own parser that describes the directives of the preprocessor and the expressions used in the switch-directives. All other details of the source will be describen as simple text-lines. Since the directive-process knows the core-flow it is possible to try out if the content of the switch-pathes contain complete code that can be parsed by the core-flow. As user-output the drictive-process generates an xml-file for every variant that contains the parsing and the information about the activity and parseabelity of each switch-path. As indirect output the source-variant will be assembled and this will be over given to the core-flow that works with it like with a normal source. The parsing-part of the directive-process works like the source-process but with an own grammar. This grammar is splitted in to parts the description of the directive-syntax and a detailed description of the switch-expressions. Once the source is parsed the result contains a detailed description-tree of the expressions. It is possible for the user to define a set of constant-values for each variant he wants to analysed. Whyle the evaluation the user given constants will be used to decide wich switch-path is active and wich not. Additionaly the source-sniptes inside of the switch-pathes will be tested by using the core-flow. Configured by the user those switches with parseable code may be kept in the source. Whyle assembling the source-variant inactive switch-pathes will be commented out. Thus the are still part of the source not as active code but as comments.

### Core-Flow Processes

 Before a source or a script can be analysed every process except the merge-process needs a parser. Every process used to analyse the sources or scripts has its own parser and that is defined in an external text-file or as part of the xml-configuration. The notation-process is the only one with a build-in parser since this process has to no how to analyse the grammar-texts. By analysing the grammar-rules for each other process the notation creates their parsers. The merge-process neads no parser since it works with the output of the other processes. After each process is configured by the config.xml and the parsers are created the analysing starts for each source or script. Each process will save its results for each source or script in an extra xml-file in the destination-folder if neaded. The first process is used to create context-depending part-sources. This ensures that each process gets no content that is unvalid for the parser. For example comments may occure every where in the original source. This makes it very difficult to define a parser that is able to deal with all possible combinations of active source-parts and comments. It is easier to cut out all comments before to prosess them in an own sequence. Here the comments will be saved to gether with some position-information. In a parrallel sequence the active-part of the code will be analysed where in a first step the line-changes will be analysed. this is necessary for languages like python where the indention-changes are used as tokens. After that the rest of the source will be analysed. After all processes have analysed the source or script their result is spliteted into different objects, since eache process produces an own result-output. The merge-process builds out of this detail-data one additional result-output that includes the content of all process-outputs of the comment- and code-sequence. The merge-process tries to take care about the text-position of all parts and sorts its output so that it reflects the architecture of the original source or script.

If you take a look to the download-section of MuLanPa you will find several files in the main-release of MuLanPa:
• ### MuLanPa_WIN32_YYYY_MM_DD.zip / MuLanPa_Linux_YYYY_MM_DD.zip

This are the distributions for windows and Linux (Suse-Linux was used to build the Linux-version. But its only a terminal-program so it should work on other Linux-distributions also I hope).
If you unzip the files you will get a directory that contains several sub-directories:  bin binaries of Moritz : abc2xml; parser-tool to transfer the source-content into an xml parser-tree xml2abc; generator-tool to create the syntax diagram decribing scripts cfg user-configuration of MuLanPa and doxygen src example sources xml output-files of abc2xml witch are the inputput-files for xml2abc dot dotbased syntax diagrams generated by xml2abc AddTxt and picture additional input-files for Doxygen to create the user documentation html and chm user documentation created by Doxygen

The batch-file "xyz_create.bat" or the shell-script "xyz_create.sh" controls the generation of the files by MuLanPa for the programming-language xyz and the generation of a documentation by Doxygen .

• ### MuLanPa_UserProject_YYYY_MM_DD.zip

This archive contains only a supset of folders and configuration-files available in the real distribution. But it can be used as a project-template. Te idea behind is to have the distributen only onetimes in your system and to use several copies of the user-project folder for several source-projects.

The user-project contains templates of all necessary configuration-files used by commonly to define the files to analyse and the basic behavior of MuLanPa. Furthermore it contains all folders necessary to store the results.

Thus the distribution itself contaims no parts associated with a special source-project. But it contains all parts wich are used for all projects in the same way.

• ### MuLanPa_UserDoku_xyz_YYYY_MM_DD.chm / MuLanPa_UserDoku_xyz_YYYY_MM_DD.zip

This is the user-documentation where the examples are written in the programmimg-language xyz. On windows you may prefer the chm-file for all others operation-systems use the zip-file it contains the documentation in html-format.

This text-file contains a short introduction and the latest user information some info to build it from the sources and the change-history of MuLanPa.

• ### src_MuLanPa_YYYY_MM_DD.zip

The zip-file contains the source-files of xml2abc if you want to build MuLanPa by your self. In addition you will find in the latest versions a project-file for the freeware IDE Code::Blocks . If you want to build the diagram-tool also by your own please download its source files at Moritz . The download-structure of this project is similar to the one of MuLanPa.

Note! Since both binaries are using the parser-library Spirit that is part of the huge boost-package you have to download it from boost extra. Once you extracted boost, you have to correct the search-path inside the Code::Blocks project-file.

If you build abc2xml and xml2abc you will only get this binaries. Thus you have to download on of zip-distributions also to get the base-version of the configuration-files. Without this file MuLanPa will not work.

Some releases contain developer-documentation also. This are the results if you use Doxygen and MuLanPa together to document the code of MuLanPa. Some release-steps makes it necessary to redesign the code to make the sources less complex. In this cases there will no developer-documentations added to the release because they are no good examples. If you are interested in, please download the developer-documentation of an older release or make the documentation by your self.

### Content

Purpose
Introduction
Controlling Scripts
How It Works

Sourceforge Project Page
Discussion
Bugs and Requests