LaTeX Binary Search Tree

Recently, I have been exploring the essence of LaTeX. The discovery of LaTeX3 really uncovers the potential of LaTeX as a “generic” programming language. Although with LuaTeX, writing complicated data structure and algorithms in pure TeX seems to be a waste of time, I still find my effort somehow meaningful in a way: at this point, I haven’t really studied programming languages and compilers formally, but playing with TeX really allows me to feel the fine line between text and program. We are able to do amazing things with TeX (which is Turing complete), but accompanied by fact that the foundation of TeX is based on text substitution and some support for numerical evaluation, I am stunned by its simplicity and universality. Text is program, program is text, it is the perfect concord.

Storing nodes in LaTeX

Because there is no stack or heap in LaTeX, we cannot just allocate new nodes in LaTeX just like in C++ or Python. It is important to keep in mind that the foundation of TeX is text substitution. When we define a macro as follows,

\newcommand{\cmd}{value}

what the TeX compiler does is to remember \cmd and replace it with value for every occurrence in the document. We can treat this as the equivalence of variable declaration in other languages. Therefore, to store tree nodes in LaTeX, we can use unique macros to store the value of each node. In my implementation, I apply the following rules:

Basic operations

Each tree is specified by a tree name; each node is specified by tree name and integer node id.

Non-recursive insertion and traversal

One of the greatest difficulties in implementing recursive functions in LaTeX is the lack of stack (namespace). That is, the value of a variable will be shared in all depths of the algorithm, which will break the program down. Therefore, non-recursive algorithms with the help of queue structures provided by l3seq are preferred.

Insertion

% data insertion function
\cs_new:Npn \insert_data:n #1 {
    % start with root node id
    \tl_gset:Nn \g_tmpa_tl {1}
    
    % initialize loop variable
    \bool_gset_true:N \g_tmpa_bool
    
    \bool_do_while:Nn \g_tmpa_bool {
        % extract data from current node id
        \bt_get_node_data:nnN {test} {\g_tmpa_tl} \l_tmpa_tl
        
        \int_compare:nNnTF {#1} < {\l_tmpa_tl} {
            % set next node id
            \bt_get_node_left_child_id:nnN {test} {\g_tmpa_tl} \l_tmpb_tl
            % if next node is empty, insert here
            \int_compare:nNnTF {\l_tmpb_tl} = {-1} {
                \bt_new_node_left_child:nn {test} {\g_tmpa_tl}
                % set data
                \bt_set_node_data:nnn {test} {\g_bt_new_node_id_tl} {#1}
                % set loop variable
                \bool_gset_false:N \g_tmpa_bool
                \par inserted~node~\tl_use:N \g_bt_new_node_id_tl \space as~left~of~node~\tl_use:N \g_tmpa_tl
            } {
                % otherwise, continue
                \tl_gset:NV \g_tmpa_tl \l_tmpb_tl
            }
        } {
            % set next node id
            \bt_get_node_right_child_id:nnN {test} {\g_tmpa_tl} \l_tmpb_tl
            % if next node is empty, insert here
            \int_compare:nNnTF {\l_tmpb_tl} = {-1} {
                \bt_new_node_right_child:nn {test} {\g_tmpa_tl}
                % set data
                \bt_set_node_data:nnn {test} {\g_bt_new_node_id_tl} {#1}
                % set loop variable
                \bool_gset_false:N \g_tmpa_bool
                \par inserted~node~\tl_use:N \g_bt_new_node_id_tl \space as~right~of~node~\tl_use:N \g_tmpa_tl
            } {
                % otherwise, continue
                \tl_gset:NV \g_tmpa_tl \l_tmpb_tl
            }
        }
    }
    
}

DFS

% because TeX does not have stack, variables of the same name will be
% shared globally. therefore, we need to use the non-recursive version
% of dfs
\cs_new:Npn \bt_dfs:n #1 {
    
    \seq_clear:N \l_tmpa_seq
    \seq_push:Nn \l_tmpa_seq {#1}
    
    \bool_do_until:nn {\seq_if_empty_p:N \l_tmpa_seq} {
        \seq_pop:NN \l_tmpa_seq \l_tmpa_tl
        \int_compare:nNnTF {\l_tmpa_tl} = {-1} {} {
            \bt_get_node_data:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
            \tl_use:N \l_tmpb_tl \space
            \bt_get_node_right_child_id:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
            \seq_push:NV \l_tmpa_seq \l_tmpb_tl
            \bt_get_node_left_child_id:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
            \seq_push:NV \l_tmpa_seq \l_tmpb_tl
        }
    }
}

Listings

Python program

This program is replicated in LaTeX.

class Node:
    
    def __init__(self):
        self.left = None
        self.right = None
        self.data = None
    

    

root = Node()
root.data = 5

def insert_data(data):
    node = root
    
    while True:
        if data < node.data:
            attr = 'left'
        else:
            attr = 'right'
        
        if getattr(node, attr) is None:
            # create node and exit loop
            new_node = Node()
            new_node.data = data
            setattr(node, attr, new_node)
            break
        else:
            node = getattr(node, attr)

def dfs(node):
    if node is None:
        return
    print(node.data, end=' ')
    dfs(node.left)
    dfs(node.right)
    

for i in range(1, 21):
    insert_data(i)
    
dfs(root)

Main TeX document

\documentclass{article}
\usepackage[a3paper, landscape]{geometry}
\usepackage{newtxtext, newtxmath}
\usepackage{amsmath}
\usepackage{datetime2}
\usepackage{expl3}

\begin{document}

\input{binary_tree}

\ExplSyntaxOn

% new tree
\bt_new:n {test}

% set value of root node
\bt_set_node_data:nnn {test} {1} {5}

\tl_new:N \l_tmpc_tl

% data insertion function
\cs_new:Npn \insert_data:n #1 {
    % start with root node id
    \tl_gset:Nn \g_tmpa_tl {1}
    
    % initialize loop variable
    \bool_gset_true:N \g_tmpa_bool
    
    \bool_do_while:Nn \g_tmpa_bool {
        % extract data from current node id
        \bt_get_node_data:nnN {test} {\g_tmpa_tl} \l_tmpa_tl
        
        \int_compare:nNnTF {#1} < {\l_tmpa_tl} {
            % set next node id
            \bt_get_node_left_child_id:nnN {test} {\g_tmpa_tl} \l_tmpb_tl
            % if next node is empty, insert here
            \int_compare:nNnTF {\l_tmpb_tl} = {-1} {
                \bt_new_node_left_child:nn {test} {\g_tmpa_tl}
                % set data
                \bt_set_node_data:nnn {test} {\g_bt_new_node_id_tl} {#1}
                % set loop variable
                \bool_gset_false:N \g_tmpa_bool
                \par inserted~node~\tl_use:N \g_bt_new_node_id_tl \space as~left~of~node~\tl_use:N \g_tmpa_tl
            } {
                % otherwise, continue
                \tl_gset:NV \g_tmpa_tl \l_tmpb_tl
            }
        } {
            % set next node id
            \bt_get_node_right_child_id:nnN {test} {\g_tmpa_tl} \l_tmpb_tl
            % if next node is empty, insert here
            \int_compare:nNnTF {\l_tmpb_tl} = {-1} {
                \bt_new_node_right_child:nn {test} {\g_tmpa_tl}
                % set data
                \bt_set_node_data:nnn {test} {\g_bt_new_node_id_tl} {#1}
                % set loop variable
                \bool_gset_false:N \g_tmpa_bool
                \par inserted~node~\tl_use:N \g_bt_new_node_id_tl \space as~right~of~node~\tl_use:N \g_tmpa_tl
            } {
                % otherwise, continue
                \tl_gset:NV \g_tmpa_tl \l_tmpb_tl
            }
        }
    }
    
}


\int_new:N \g_dfs_int
\int_gset:Nn \g_dfs_int {0}


% because TeX does not have stack, variables of the same name will be
% shared globally. therefore, we need to use the non-recursive version
% of dfs
\cs_new:Npn \bt_dfs:n #1 {
    
    \seq_clear:N \l_tmpa_seq
    \seq_push:Nn \l_tmpa_seq {#1}
    
    \bool_do_until:nn {\seq_if_empty_p:N \l_tmpa_seq} {
        \seq_pop:NN \l_tmpa_seq \l_tmpa_tl
        \int_compare:nNnTF {\l_tmpa_tl} = {-1} {} {
            \bt_get_node_data:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
            \tl_use:N \l_tmpb_tl \space
            \bt_get_node_right_child_id:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
            \seq_push:NV \l_tmpa_seq \l_tmpb_tl
            \bt_get_node_left_child_id:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
            \seq_push:NV \l_tmpa_seq \l_tmpb_tl
        }
    }
}


\int_gset:Nn \g_tmpa_int {1}
\int_do_until:nNnn {\g_tmpa_int} > {20} {
    \exp_args:Nx \insert_data:n {\int_use:N \g_tmpa_int}
    \int_gincr:N \g_tmpa_int
}

\par

\bt_dfs:n {1}

\ExplSyntaxOff


\par\DTMNow

\end{document} 

binary_tree.tex

\ExplSyntaxOn

\tl_new:N \g_bt_new_node_name_tl
\tl_new:N \g_bt_new_node_id_tl

\tl_new:N \__g_bt_tmpa_tl
\tl_new:N \__g_bt_tmpb_tl

% get name of the node given tree name and node id
\cs_new:Npn \bt_node_name:nn #1#2 {
    __g_bt_#1_node_\int_to_alph:n {#2}_tl
}

% get the name of node counter
\cs_new:Npn \bt_counter_name:n #1 {
    __g_bt_#1_counter_int
}

% get counter value for a tree
\cs_new:Npn \bt_get_counter:n #1 {
    \int_use:c {\bt_counter_name:n {#1}}
}

% set value of a node
% #1: tree name
% #2: node id
% #3: left node id
% #4: right node id
% #5: data
\cs_new:Npn \bt_set_node:nnnnn #1#2#3#4#5 {
    \tl_gset:cn { \bt_node_name:nn {#1} {#2} } { {#3}{#4}{#5} }
}

\cs_generate_variant:Nn  \bt_set_node:nnnnn {nnxxx}


\msg_new:nnn {bt} {nonodeerror} {node\ #2\ of\ tree\ "#1"\ does\ not\ exist.}
\msg_new:nnn {bt} {haschilderror} {node\ #2\ of\ tree\ "#1"\ already\ has\ a \ #3\ child.}

% create a new node and put the node name in \g_bt_new_node_tl
\cs_new:Npn \bt_new_node:n #1 {
    % increment counter value
    \int_gincr:c {\bt_counter_name:n {#1}}
    \tl_gset:Nx \g_bt_new_node_id_tl {\bt_get_counter:n {#1}}
    %\par idd~ \cs_meaning:N \g_bt_new_node_id_tl
    
    % set new node name
    \tl_gset:Nx \g_bt_new_node_name_tl {
        \bt_node_name:nn {#1} {\g_bt_new_node_id_tl}
    }
    
    %\par node~name:~\cs_meaning:N \g_bt_new_node_name_tl~end
    % declare new node
    \tl_new:c {\g_bt_new_node_name_tl}

    % set initial value for the node
    \bt_set_node:nnnnn {#1} {\g_bt_new_node_id_tl} {-1} {-1} {}
    %\par \cs_meaning:c \g_bt_new_node_name_tl
}

% get node given tree name and node id
% stores the node in #3
\cs_new:Npn \bt_get_node:nnN #1#2#3 {
    \tl_gset:Nx \__g_bt_tmpa_tl {\bt_node_name:nn {#1} {#2}}
    \tl_if_exist:cTF {\__g_bt_tmpa_tl} {
        \tl_gset_eq:Nc #3 {\__g_bt_tmpa_tl}
    } {
        \msg_error:nnnn {bt} {nonodeerror} {#1} {#2}
    }
}

\cs_new:Npn \bt_set_node_data:nnn #1#2#3 {
    \bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
    \bt_set_node:nnxxx {#1} {#2} {\tl_item:Nn \__g_bt_tmpa_tl {1}} {\tl_item:Nn \__g_bt_tmpa_tl {2}} {#3}
}

% get node id of left child, saves to #3
\cs_new:Npn \bt_get_node_left_child_id:nnN #1#2#3 {
    \bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
    \tl_gset:Nx #3 {\tl_item:Nn \__g_bt_tmpa_tl {1}}
}

% get node id of right child, saves to #3
\cs_new:Npn \bt_get_node_right_child_id:nnN #1#2#3 {
    \bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
    \tl_gset:Nx #3 {\tl_item:Nn \__g_bt_tmpa_tl {2}}
}


% get left child, saves to #3
\cs_new:Npn \bt_get_node_left_child:nnNTF #1#2#3#4#5 {
    \bt_get_node_left_child_id:nnN {#1} {#2} \__g_bt_tmpa_tl
    \int_compare:nNnTF {\__g_bt_tmpa_tl} = {-1} {
        #5
    } {
        \bt_get_node:nnN {#1} {\__g_bt_tmpa_tl} #3
        #4
    }
}

% get right child, saves to #3
\cs_new:Npn \bt_get_node_right_child:nnNTF #1#2#3#4#5 {
    \bt_get_node_right_child_id:nnN {#1} {#2} \__g_bt_tmpa_tl
    \int_compare:nNnTF {\__g_bt_tmpa_tl} = {-1} {
        #5
    } {
        \bt_get_node:nnN {#1} {\__g_bt_tmpa_tl} #3
        #4
    }
}


\cs_new:Npn \bt_get_node_data:nnN #1#2#3 {
    \bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
    \tl_gset:Nx #3 {\tl_item:Nn \__g_bt_tmpa_tl {3}}
}



\cs_new:Npn \bt_new_node_left_child:nn #1#2 {
    \bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
    \tl_gset:Nx \__g_bt_tmpb_tl {\tl_item:Nn \__g_bt_tmpa_tl {1}}
    \int_compare:nNnTF {\__g_bt_tmpb_tl} = {-1} {
        % create new node
        \bt_new_node:n {#1}
        %\par id~ \cs_meaning:N \g_bt_new_node_id_tl
        \bt_set_node:nnxxx {#1} {#2} {\g_bt_new_node_id_tl} {\tl_item:Nn \__g_bt_tmpa_tl {2}}  {\tl_item:Nn \__g_bt_tmpa_tl {3}}
    } {
        \msg_error:nnnnn {bt} {haschilderror} {#1} {#2} {left}
    }
}

\cs_new:Npn \bt_new_node_right_child:nn #1#2 {
    \bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
    \tl_gset:Nx \__g_bt_tmpb_tl {\tl_item:Nn \__g_bt_tmpa_tl {2}}
    \int_compare:nNnTF {\__g_bt_tmpb_tl} = {-1} {
        % create new node
        \bt_new_node:n {#1}
        \bt_set_node:nnxxx {#1} {#2} {\tl_item:Nn \__g_bt_tmpa_tl {1}} {\g_bt_new_node_id_tl} 
        {\tl_item:Nn \__g_bt_tmpa_tl {3}}
    } {
        \msg_error:nnnnn {bt} {haschilderror} {#1} {#2} {right}
    }
}

% create new binary tree with a name
\cs_new:Npn \bt_new:n #1 {
    % create counter
    \int_new:c {\bt_counter_name:n {#1}}
    % set counter
    \int_gset:cn {\bt_counter_name:n {#1}} {0}
    
    % create a root node
    \bt_new_node:n {#1}
}



\ExplSyntaxOff

Python output

PS C:\Users\Alan\Desktop\test_tex> python .\bt.py
5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

LaTeX output

inserted node 2 as left of node 1
inserted node 3 as right of node 2
inserted node 4 as right of node 3
inserted node 5 as right of node 4
inserted node 6 as right of node 1
inserted node 7 as right of node 6
inserted node 8 as right of node 7
inserted node 9 as right of node 8
inserted node 10 as right of node 9
inserted node 11 as right of node 10
inserted node 12 as right of node 11
inserted node 13 as right of node 12
inserted node 14 as right of node 13
inserted node 15 as right of node 14
inserted node 16 as right of node 15
inserted node 17 as right of node 16
inserted node 18 as right of node 17
inserted node 19 as right of node 18
inserted node 20 as right of node 19
inserted node 21 as right of node 20
5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2020-06-19 18:29:44-04:00