\documentclass[runningheads,a4paper]{llncs} \usepackage{graphicx,url,fancyvrb} \usepackage{multicol,multirow,booktabs,colortbl,extarrows,color,relsize} \fvset{fontsize=\small, frame=single} \usepackage[ruled,vlined]{algorithm2e} \usepackage[mathletters]{ucs} \usepackage[utf8]{inputenc} \newcommand{\keywords}[1]{\par\addvspace\baselineskip \noindent \textbf{Keywords:}\enspace\ignorespaces#1} \def\cod#1{\texttt{#1}} \def\codm#1{\normalfont{\texttt{#1}}} \begin{document} \mainmatter % start of an individual contribution % a short form should be given in case it is too long for the running head \titlerunning{New algorithms for smart assessment of math exercises} \title{\textsc{Lang-importer} - a DSL for definition of foreign language inline importing} \author{J.João Almeida \and Alberto Simões} %\authorrunning{} \institute{Universidade do Minho, Portugal} %\toctitle{} %\tocauthor{} \maketitle \begin{abstract} \end{abstract} \keywords{DSL, language inline} \section{Introduction} When we are dealing with DSL, authoring environments, markup languages, even programming languages, we often benefit from the possibility of using foreign language embedding mechanisms. Although the embedding mechanism can take many nuances we can easily find some common patterns with large variety of syntactic variants. We will start by presenting some examples. ==*LaTeX + abc In \LaTeX{} we can find several packages dealing with inline foreign language inclusion of textual languages like: . \textsf{abc}\cite{ctan-abc} music scores notation, . \textsf{gnuplot}\cite{ctan-gnuplot} graphics . \textsf{GraphViz}\cite{ctan-graphviz} graph generation tool . ... # Example of \textsf{abc}: \begin{Verbatim} .... \begin{abc} X:4 T:Cronin’s Hornpipe S:Keenan and Glackin M:2/4 K:G BA|GABc dBde|gage dega|bage dBGB|cABG A2BA|! GABc dBde|gage dega|bage dBAB|G2G2 G2:|! fg|afd^c d2ga|bged e2ga|(3bag (3agf gedB|(3cBA AG AcBA|! GABc dBde|~g3e dega|bage dBAB|G2G2 G2:|! \end{abc} ... \end{Verbatim} In order to produce a final PDF file, (during pdflatex command execution) the following extra steps will be performed: .[Extractor stage] ~\\ extract abc lines to an external file (lets name it "f.abc") .[Processor stage] ~\\ process "f.abc" producing a PDF image (\texttt{abcm2ps f.abc -o ....}) .[Replacer stage] ~\\ replace abc lines by latex instructions to include the results of the processor stage -- in this case include "f.pdf" image: \begin{Verbatim} \includegraphics{f.pdf} \end{Verbatim} # ==*Perl + RewriteRules In this example we are using Text::RewriteRules\cite{Text::RewriteRules} Perl module, that is in fact a embedded DSL for definition of textual substitution based rewrite system\cite{cpan-Text::RewriteRules}. The following example is creates a perl function for obfuscating email \footnote{\cod{email("aa@bb.cc")} = \cod{"aa AT bb DOT cc"}}: \begin{Verbatim} .... RULES email \.==> DOT @==> AT ENDRULES ... \end{Verbatim} After Perl loading and filtering stage\cite{cpan-Filter::Simple}, the final Perl that is in fact executed is: \begin{Verbatim} ... sub email { my $p = shift; ... while($modified) { ... s{\.}{ DOT }; ... s{@}{ AT }; ... } return $p; } ... \end{Verbatim} Once more we can recognize the same steps: .[Extractor stage] ~\\ Extract the lines correspondent to the DSL rules; .[Processor stage] ~\\ Compile them to a Perl function; .[Replacer stage] ~\\ replace DSL lines by the generated Perl function. # ==*Summarizing We believe that inline foreign language embedding processes is very useful and widely need in most markup language, programming languages, and in general textual languages. In this article we will try to: . generalize this foreign language embed process, and discuss a generic algorithm for the problem and, . purpose a opinionated tool to deal with some lang-importer situations. # \section{Inline foreign language embedding} From the examples of the previous section, it is clear that we may recognize three stages in the inline language embedding process. This way me may associate a function to each stage: . extractor : file → ( chunk × position)* . processor : chunk → output-file* × subst-chunk . replacer : file × ( position × subst-chunk)* → file # The general algorithm is: \begin{algorithm}[H] \DontPrintSemicolon \caption{Language Importer\label{algo1}} \KwIn{File} \KwOut{NewFile: the file after inline processing} \vspace{2mm} chunks ← extractor(File) \; \ForAll{ch=(chunk,pos) $\in$ chunks}{ ch.subst-chunk ← processor(chunk)\; } NewFile ← replacer(File,chunks)\; return(Newfile) \; \end{algorithm} The function \cod{extractor} is language dependent and has to be defined in a style that is consistent with language syntax conventions. The function \cod{processor} normally uses tools and modules connected to the embedded language. The function \cod{replacer} can be language independent. For example if position is a pair of lines, ( position = (begin:line, end:line) ), \begin{algorithm}[H] \DontPrintSemicolon \caption{Replacer} \KwIn{file} \KwIn{chunks: (chunk, position)* } \KwOut{file: after inline replacing} \vspace{2mm} \ForAll{ch=(chunk,pos) $\in$ reverse(chunks) }{ file[pos.begin .. pos.end] = chunk \; } return(file) \; \end{algorithm} \subsection{Multi-language embedding algorithm} It is useful to have simultaneously several kinds of embedded languages. Relating to the previous examples, we could have a LaTeX file with inline abc chunks and inline gnuplot chunks. One possible approach is to refine our model adding a definition of a chunk type. This way the function signatures could change to: . extractor : file → ( chunk × type × position)* . processor : chunk × type → output-file* × subst-chunk . replacer : file × ( position × subst-chunk)* → file # One natural evolution is to provide a table of type specific processors (uncurry the processor function). \begin{algorithm}[H] \DontPrintSemicolon \caption{Multi Language Importer\label{algo1}} \KwIn{File} \KwIn{procTable: mapping(type → processor)} \KwOut{NewFile: the file after inline processing} \vspace{2mm} chunks ← extractor(File) \; \ForAll{ch=(chunk,type,pos) $\in$ chunks}{ ch.subst-chunk ← procTable[type](chunk)\; } NewFile ← replacer(File,chunks)\; return(Newfile) \; \end{algorithm} In a similar way we can also create a table of type specific extractors. \section{Lang-importer} In this section we will present a prototype that implements some of the previous section strategies. As usual, we will simplify the functionality and skip some technical details. In order to make the task specification and implementation simple we will impose some extra constraints that reduce flexibility but increase simplicity. The prototype \cod{lang-importer} is written in Perl, uses regular expression based definitions when possible. \cod{lang-importer} is composed by too parts: . the lang-importer builder: that generates a \cod{lang-importer} from a multi-file DSL configuration; . the lang-importer tool for final users (generated by the previous) # ==One single extractor function Following we present some basic simplification decision: . finite and known chunk types (example: abc, gnuplot, dot) . chunk type are valid C identifiers . chunks are constituted by complete lines . chunk extraction is based on regular expressions # \begin{Verbatim} \begin{inline_abc} X:1 ... \end{inline_abc} \begin{inline_dot} digraph{ rankdir=LR ... } \end{inline_dot} \end{Verbatim} We just need to know: \begin{Verbatim} typespat = join("|",@types); ## common to all the cases beginRegexp = qr{\\begin\{inline_($typespat)\}}; endRegexp = qr{\\end\{inline_(\1)\}}; \end{Verbatim} Please note that \cod{beginRegenx} must capture chunk type (as group1) The algorithm is \begin{algorithm}[H] \DontPrintSemicolon \caption{Extractor} \KwIn{file} \KwIn{beginre: Regular expression of begin chunk} \KwIn{endre: Regular expression of end chunk} \KwOut{chunks: (chunk, type, position)* } // Note: we are ignoring \cod{position} for simplification\vspace{2mm} \ForAll{matches of \codm{/\$beginre(.*?)\$endre/} in file}{ push( chunks, (type:group(1), chunk:group(2))) \; } return(chunks) \; \end{algorithm} ==Multi-file DSL configuration Each Each file is in fact a chunk type plugins ==Lang-importer builder \section{Conclusions} \bibliographystyle{plain} \bibliography{l} \end{document}