\documentclass[runningheads,a4paper]{llncs}

\usepackage{graphicx,url,fancyvrb}
\usepackage{multicol,multirow,booktabs,colortbl,extarrows,color,relsize}
\fvset{fontsize=\small, frame=single}
\usepackage[ruled,vlined]{algorithm2e}

\usepackage[mathletters]{ucs}
\usepackage[utf8]{inputenc}

\newcommand{\keywords}[1]{\par\addvspace\baselineskip
\noindent \textbf{Keywords:}\enspace\ignorespaces#1}

\def\cod#1{\texttt{#1}}
\def\codm#1{\normalfont{\texttt{#1}}}

\begin{document}

\mainmatter  % start of an individual contribution

% a short form should be given in case it is too long for the running head
\titlerunning{New algorithms for smart assessment of math exercises}


\title{\textsc{Lang-importer} - a DSL for definition of foreign
            language inline importing}
\author{J.João Almeida \and Alberto Simões}

%\authorrunning{}
\institute{Universidade do Minho, Portugal}
%\toctitle{}
%\tocauthor{}

\maketitle

\begin{abstract}
\end{abstract}
\keywords{DSL, language inline}

\section{Introduction}

When we are dealing with DSL, authoring environments, markup languages, even
programming languages, we often benefit from the possibility of using foreign
language embedding mechanisms.

Although the embedding mechanism can take many nuances we can easily find some
common patterns with large variety of syntactic variants.

We will start by presenting some examples.

==*LaTeX + abc

In \LaTeX{} we can find several packages dealing with inline foreign language
inclusion of textual languages like:
  .  \textsf{abc}\cite{ctan-abc} music scores notation, 
  .  \textsf{gnuplot}\cite{ctan-gnuplot} graphics
  .  \textsf{GraphViz}\cite{ctan-graphviz} graph generation tool
  . ...
  #

Example of \textsf{abc}:

\begin{Verbatim}
....
\begin{abc}
X:4
T:Cronin’s Hornpipe
S:Keenan and Glackin
M:2/4
K:G
BA|GABc dBde|gage dega|bage dBGB|cABG A2BA|!
GABc dBde|gage dega|bage dBAB|G2G2 G2:|!
fg|afd^c d2ga|bged e2ga|(3bag (3agf gedB|(3cBA AG AcBA|!
GABc dBde|~g3e dega|bage dBAB|G2G2 G2:|!
\end{abc}
...
\end{Verbatim}

In order to produce a final PDF file, (during pdflatex command execution) the following extra steps will be performed:
  .[Extractor stage] ~\\ extract abc lines to an external file (lets name it "f.abc")
  .[Processor stage] ~\\ process "f.abc" producing a PDF image (\texttt{abcm2ps f.abc -o ....})
  .[Replacer stage] ~\\ replace abc lines by latex instructions to include the
results of the processor stage -- in this case include "f.pdf" image:
\begin{Verbatim}
 \includegraphics{f.pdf}
\end{Verbatim}
#

==*Perl + RewriteRules

In this example we are using Text::RewriteRules\cite{Text::RewriteRules} Perl module, that is
in fact a embedded DSL for definition of textual substitution based rewrite
system\cite{cpan-Text::RewriteRules}.

The following example is creates a perl function for obfuscating email
\footnote{\cod{email("aa@bb.cc")} = \cod{"aa AT bb DOT cc"}}:

\begin{Verbatim}
....
RULES email
\.==> DOT
@==> AT
ENDRULES
...
\end{Verbatim}

After Perl loading and filtering stage\cite{cpan-Filter::Simple}, the final Perl
that is in fact executed is:
\begin{Verbatim}
...
sub email {
  my $p = shift;
    ...
    while($modified) {
      ...
        s{\.}{ DOT };
      ...
        s{@}{ AT };
      ...
    }
  return $p;
}
...
\end{Verbatim}

Once more we can recognize the same steps:
  .[Extractor stage] ~\\ Extract the lines correspondent to the DSL rules;
  .[Processor stage] ~\\ Compile them to a Perl function;
  .[Replacer stage] ~\\ replace DSL lines by the generated Perl function.
  #

==*Summarizing

We believe that inline foreign language embedding processes
 is very useful and widely need in most markup language, programming
languages, and in general textual languages.

In this article we will try to:
  .  generalize this foreign language embed process,
and discuss a generic algorithm for the problem and, 
  . purpose a opinionated tool
to deal with some lang-importer situations.
 #

\section{Inline foreign language embedding}

From the examples of the previous section, it is clear that
we may recognize three stages in the inline language embedding process.

This way me may associate a function to each stage:
  . extractor : file → ( chunk × position)*
  . processor : chunk → output-file* × subst-chunk
  . replacer : file × ( position × subst-chunk)* → file
#

The general algorithm is:

\begin{algorithm}[H]
\DontPrintSemicolon
  \caption{Language Importer\label{algo1}}
  \KwIn{File}
  \KwOut{NewFile: the file after inline processing} \vspace{2mm}
  chunks ← extractor(File) \;
  \ForAll{ch=(chunk,pos) $\in$ chunks}{
     ch.subst-chunk ← processor(chunk)\;
  }
  NewFile ← replacer(File,chunks)\;
  return(Newfile) \;
\end{algorithm}

The function \cod{extractor} is language dependent and has to be defined in a 
style that is consistent with language syntax conventions.

The function \cod{processor} normally uses tools and modules connected to the
embedded language.

The function \cod{replacer} can be language independent. For example if
position is a pair of lines, ( position = (begin:line, end:line) ),

\begin{algorithm}[H]
\DontPrintSemicolon
  \caption{Replacer}
  \KwIn{file}
  \KwIn{chunks: (chunk, position)* }
  \KwOut{file: after inline replacing} \vspace{2mm}

  \ForAll{ch=(chunk,pos) $\in$ reverse(chunks) }{
     file[pos.begin .. pos.end] = chunk \;
  }
  return(file) \;
\end{algorithm}

\subsection{Multi-language embedding algorithm}

It is useful to have simultaneously several kinds of embedded 
languages. 
Relating to the previous examples, we could have a LaTeX file with inline abc
chunks and inline gnuplot chunks.

One possible approach is to refine our model adding a definition of a chunk type.

This way the function signatures could change to:
  . extractor : file → ( chunk × type × position)*
  . processor : chunk × type → output-file* × subst-chunk
  . replacer : file × ( position × subst-chunk)* → file
#

One natural evolution is to provide a table of type specific processors (uncurry
the processor function).


\begin{algorithm}[H]
\DontPrintSemicolon
  \caption{Multi Language Importer\label{algo1}}
  \KwIn{File}
  \KwIn{procTable: mapping(type → processor)}
  \KwOut{NewFile: the file after inline processing} \vspace{2mm}
  chunks ← extractor(File) \;
  \ForAll{ch=(chunk,type,pos) $\in$ chunks}{
     ch.subst-chunk ← procTable[type](chunk)\;
  }
  NewFile ← replacer(File,chunks)\;
  return(Newfile) \;
\end{algorithm}

In a similar way we can also create a table of type specific extractors.

\section{Lang-importer}

In this section we will present a prototype that implements some of the
previous section strategies. As usual, we will simplify the functionality
and skip some technical details.

In order to make the task specification and implementation simple we will
impose some extra constraints that reduce flexibility but increase simplicity.

The prototype \cod{lang-importer} is written in Perl, uses regular expression
based definitions when possible.

\cod{lang-importer} is composed by too parts:
  . the lang-importer builder: that generates a \cod{lang-importer} from a 
multi-file DSL configuration;
  . the lang-importer tool for final users (generated by the previous)
  #

==One single extractor function

Following we present some basic simplification decision:
  . finite and known chunk types  (example: abc, gnuplot, dot)
  . chunk type are valid C identifiers
  . chunks are constituted by complete lines
  . chunk extraction is based on regular expressions
#

\begin{Verbatim}
  \begin{inline_abc}
  X:1
  ...
  \end{inline_abc}
  
  \begin{inline_dot}
  digraph{
  rankdir=LR
  ...
  }
  \end{inline_dot}
\end{Verbatim}

We just need to know:
\begin{Verbatim}
  typespat = join("|",@types);    ## common to all the cases

  beginRegexp = qr{\\begin\{inline_($typespat)\}};
  endRegexp   = qr{\\end\{inline_(\1)\}};
\end{Verbatim}
Please note that \cod{beginRegenx} must capture chunk type (as group1)

The algorithm is

\begin{algorithm}[H]
\DontPrintSemicolon
  \caption{Extractor}
  \KwIn{file}
  \KwIn{beginre: Regular expression of begin chunk}
  \KwIn{endre: Regular expression of end chunk}
  \KwOut{chunks: (chunk, type, position)* }
  // Note: we are ignoring \cod{position} for simplification\vspace{2mm}

  \ForAll{matches of \codm{/\$beginre(.*?)\$endre/} in file}{
     push( chunks, (type:group(1),  chunk:group(2))) \;
  }
  return(chunks) \;
\end{algorithm}


==Multi-file DSL configuration

Each 
Each file is in fact a chunk type plugins

==Lang-importer builder


\section{Conclusions}

\bibliographystyle{plain}
\bibliography{l}


\end{document}