# LaTeX3: Programming in LaTeX with Ease

Many people view LaTeX as a typesetting language and overlook the importance of programming in document generation process. As a matter of fact, many large and structural documents can benefit from a programming backend, which enhances layout standardization, symbol coherence, editing speed and many other aspects. Despite the fact the standard LaTeX (LaTeX2e) is already Turing complete, which means it is capable of solving any programming task, the design of many programming interfaces is highly inconsistent due to compatibility considerations. This makes programming with LaTeX2e very challenging and tedious, even for seasoned computer programmers.

To make programming in LaTeX easier, the LaTeX3 interface is introduced, which aims to provide modern-programming-language-like syntax and library for LaTeX programmers. Unfortunately, there is little material regarding this wonderful language. When I started learning it, I had to go through its complex technical manual, which is time-consuming. Therefore, I decide to write a LaTeX3 tutorial that is easy-to-understand for generic programmers.

## Preface

• The preamble of all examples
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{tikz} % load TikZ for some examples


Please place the example content between \begin{document} and \end{document} blocks. Note that in newer distributions (later than 2020-02-02 release), expl3 has become part of as the “L3 programming layer”. In these new distributions, there is no need to use expl3 package explicitly.

• All examples have been tested with TeXLive 2020 (Windows 10)
• This article only provides simple introduction to frequently used modules, because it is fairly difficult for me to cover all aspects of within a short amount of time. For now, all APIs are documented in The LaTeX3 Interfaces.
• The roman letters in section titles are the corresponding chapter numbers in The LaTeX3 Interfaces.

## Why LaTeX3?

### Handle macro expansion like a boss

Fundamentally, works by doing macro substitution: commands are substituted by their definition, which is subsequently replaced by definition’s definition, until something irreplaceable is reached (e.g. text). For example, in the following example, \myname is substituted by \mynameb; \mynameb is then substituted by \mynama; and eventually, \mynamea is replaced by John Doe, which cannot be expanded anymore. This process is called expansion.

\newcommand{\mynamea}{John Doe}
\newcommand{\mynameb}{\mynamea}
\newcommand{\myname}{\mynameb}
My name is \myname.


Most command we use everyday has complicated definitions. During compilation, they will be expanded recursively until text or primitive is reached. This process sounds pretty straightforward, until we want to change the order of macro expansion.

Why do we need to change the order of macro expansion? Let’s consider the \uppercase macro in , which turns lowercase letters into uppercase ones. But consider the following case, where we try to apply \uppercase to letters abcd and a command \cmda. Since \cmda expands to abcd, we expect the outcome to be ABCDABCD. In reality, gives us ABCDabcd, which means the content of \cmda is unchanged.

\newcommand{\cmda}{abcd}
\uppercase{abcd\cmda} %ABCDabcd


How can this happen? During the expansion of \uppercase, the command scans the item inside the adjacent curly braces one by one. If an English letter is encountered, an uppercase counterpart is left in the output stream; otherwise, the original item is left in the input stream. When it’s \cmda’s turn, because it is a command instead of a letter, it is left untouched in the output stream, which is expanded to abcd later.

What if we want to capitalize everything inside the curly braces? That would require the macro \cmda to be expanded before \uppercase, or equivalently, changing the order of macro expansion. The classical way of doing so in is via \expandafter. Unfortunately, the usage of \expandafter is extremely complicated1: in a string of n tokens2, to expand the ith token, there must be $$2^{n-i}-1$$ \expandafter’s before the ith token. Below is a example of how bad this can look like:

\documentclass{article}
\begin{document}

\def\x#1#2#3#4{%
\def\arga{#2}%
\def\argb{#3}%
\def\argc{#4}%
\expandafter\expandafter\expandafter\expandafter\expandafter\expandafter\expandafter#1%
\expandafter\expandafter\expandafter\expandafter\expandafter\expandafter\expandafter
{\expandafter\expandafter\expandafter\arga\expandafter\expandafter\expandafter}%
\expandafter\expandafter\expandafter{\expandafter\argb\expandafter}\expandafter
{\argc}}

\def\y#1#2#3{\detokenize{#1#2#3}}

\x\y{arg1}{arg2}{arg3}

\end{document}


Clearly, it is nowhere near decency: the excessive number of \expandafter’s are sometimes referred to as “\expandafter purgatory”. As a result, one of the features of is to provide simple and reliable expansion control.

### Messy interfaces in LaTeX

Believe it or not, is able to achieve everything other generic programming languages can do (e.g. C++, Python, Java)3. However, the function call conventions can be wildly distinct across different tasks; some similar functionalities can be independently implemented various packages. Here are some examples:

\newread\file
\openin\file=myfilename.txt
\loop\unless\ifeof\file
% Do something with \fileline
\repeat
\closein\file

• File write
\newwrite\file
\immediate\openout\file=myfilename.txt
\immediate\write\file{A line of text to write to the file}
\immediate\write\file{Another line of text to write to the file}
\closeout\file

• Integer arithmetic
\newcount\mycount
\mycount=\numexpr(25+5)/3\relax
\multiply\mycount by 2

• Condition
% command-related if statement
\ifx\mycmd\undefined
undefed
\else
\if\mycmd1
defed, 1
\else
defed
\fi
\fi

% number-related if statement
\ifdim#1pt=#2pt
Equal.\\
\else%
Not equal.\\
\fi%

• Loop

% use \loop
\newcount\foo
\foo=10
\loop
\message{\the\foo}
\ifnum \foo>0
\repeat

% while loop (provided by ifthen package)
\newcounter{ct}
\setcounter{ct}{1}
\whiledo {\value{ct} < 5}%
{
\the\ct
\stepcounter {ct}%
}

% for loop (provided by ifthen package)
\forloop{ct}{1}{\value{ct} < 5}%
{%
\the\ct
}


These inconsistencies set a high bar for new users and make it difficult to connect multiple components together, even for experienced programmers. Therefore, aims to provide standardized programming interfaces and documentation for the language.

### Goals of LaTeX3

• Modernize the syntax of
• Simplify macro expansion control
• Unify the interfaces across various components
• Provide standardized libraries for packages (e.g. floating-point arithmetic, regular expression, random number, high-performance array, etc.)

## LaTeX3 Naming Conventions (I-1)

In the following code snippet, we declare a variable \vara and a function \cmda. The way we distinguish between a variable and a function is simply by judging whether the command absorbs arguments or not. However, the fact that they are all called “commands” and created with \newcommand reflects that they are fundamentally the same for system.

\newcommand{\vara}{this is a variable}
\newcommand{\cmda}[1]{this is a command: #1}


From users’ perspective, it is important to separate variables from functions because their usages are different. Therefore, our only option is to encode this information into the name of commands, so that users can differentiate variables and functions with little effort. This is why we need to introduce the naming convention. Before actually elaborating on naming style, I would like to make a small diversion and introduce category code first.

### Category code and command names

In , every character that we enter is associated with a category code. Standard category code assignment can be seen in the following table:

Category code Description Standard /
0 Escape character-tells to start looking for a command \
1 Start a group {
2 End a group }
% close file
\ior_close:N \g_tmpa_ior
\ExplSyntaxOff


Output:

1.2+2.6+3.7+4.9+5.0+6.5+7.4+8.2+9.4+10.8=59.7


## Memo

Useful techniques:

• Many modules provide show functions. They can print the content of variables to the log file, which is very helpful for debug purposes.
• also supports generating random numbers or triggering randomized access to token lists or queues.

Because the number of modules is huge, it is very difficult to cover most of them in a limited amount of time. Here, I list some other libraries that are worth looking at:

• l3coffins (XXX), l3box (XXIX): allows one to gauge the width/height of objects
• l3intarray (XXII), l3fparray (XXIV): high performance numeric arrays
• l3sort: sotring queues/token lists
• l3msg: generating exception messages

## End Note

In this article, I try to breifly introduce the naming convention, the usage of variables and functions and some commonly used modules of . I hope that this organization enables readers to understand the basics of , which allows them to write simple programs quickly.

There is no doubt that many documents can benefit from the programming capabilities of . It is really pitiful that existing tutorials on is rare, which significantly limits the development of this language. Hopefully, this article can help more people get familar with this powerful tool.

1. I am not an expert at \expandafter. See more at https://www.zhihu.com/question/26916597/answer/34565213, https://www.tug.org/TUGboat/tb09-1/tb20bechtolsheim.pdf

2. Only a subset of all variants are listed here

3. Strings are essentially token lists, where each character has category code 11 or 12. What \tl_rescan:nn` does is to reassign category code based on provided category code table. Therefore, it is possible to reactivate commands and special characters. See https://tex.stackexchange.com/questions/404108/convert-string-to-token-list