%Paradigm: Sorting. Aug 1995, revised Feb 1996
%C.G. van der laan, cgl@rc.service.rig.nl
\input blue.tex
\loadtocmacros
\loadindexmacros %necessary for sorting examples
\tolerance500\hbadness=499\hfuzz=5pt
\bluetitle Paradigms: Sorting
\blueissue \maps{96}2
\everyverbatim{\emc}
\def\on{\bgroup\afterassignment\doold\count0=}
\def\doold{\oldstyle\the\count0\egroup}
\beginscript
\bluehead BLUe's Design IX
Hi folks.
A strong and {\em unique\/} point of BLUe's format system is its
indexing on the fly.
Be it for a total document or just for a chapter. One of the requisites
for indexing on the fly is the possibility to sort within \TeX.
Sorting has always been an important topic in computer science.
In \TeX{} I needed sorting on several occasions especially for sorting
numbers such as citation lists,
words such as addresses, and
index entries.
This note is devoted to paradigms encountered while implementing and applying
sorting in \TeX.
Sorting can be characterized by
\item- the set to be sorted (numbers, word. etc.)
\item- the addressing of elements of the set
\item- the ordering for the set
\item- the comparison operation, and
\item- the exchange operation.
\smallbreak
To do some sorting of your own please load from \bluetex{} the index macros
via \cs{loadindexmacros}.
Below parts have been extracted from that collection of
macros to make this note as intelligible as possible.
\cs{ea} is my shortcut for \cs{expandafter}.
\bluehead Linear sorting
A simple sorting method is repeatedly searching for the smallest element.
In the example below the set is defined as a def with
list element tag |\\|.
\thisverbatim{\unmc}
\beginverbatim
\def\lst{\\\ia\\\ib\\\ic}
\def\ia{314} \def\ib{1} \def\ic{27}
%
\def\dblbsl#1{\ifnum#1<\min\let\min=#1\fi}
%
\loop\ifx\empty\lst\expandafter\break\fi
\def\\{\let\\=\dblbsl\let\min=} %space
\lst%find minimum
\min%typeset minimum
{\def\\#1{\ifx#1\min \else\nx\\%
\nx#1\fi}\xdef\lst{\lst}}%
\pool%Inspired upon van der Goot's
%Midnight macros.
\def\loop#1\pool{#1\loop#1\pool}
\def\break#1\pool{}
!endverbatim
The coding implements the looping of the basic steps
\item- find minimum (via \cs{lst},
and suitable definition of \DeK's list element tag |\\|)
\item- typeset minimum (via \cs{min})
\item- delete minimum from the list (via another appropriate
definition of the list element tag.
\smallbreak
Remark.
The kludge for using \cs{ifx} instead of \cs{ifnum} in the deletion part
is necessary because \TeX{} inserts a \cs{relax}.
\bluehead Sorting in an array
If we adopt array addressing in \TeX{} for the elements to be sorted
then we can implement bubble sort in \TeX{} too.\ftn
{The above example of linear sorting can be seen as sorting in a
so-called associative array.}
\bluesubhead Array addressing
When we think of associating values to (index) numbers\Dash
|1| $\rightarrow$ |\value{1}| \Dash
then we are talking about an array.
A mapping of the \IN{} on \dots for example \IN.
The \cs{value} control sequence can be implemented as follows.
\beginverbatim
\def\value#1{\csname#1\endcsname}
!endverbatim
The writing to the array elements can be done via
\beginverbatim
\def\1{} \def\2{}...
!endverbatim
In general this must be done via
\beginverbatim
\ea\def\csname\endcsname{}
!endverbatim
\bluesubsubhead To get the hang of it
The reader must be aware of the differences between
\item- the index number, $\langle k\rangle$
\item- the counter variable \cs{k}, with the value $\langle k\rangle$
as index number
\item- the control sequences |\|$, k=1, 2, \dots, n$,
with as replacement texts the items to be sorted.
\smallbreak
When we have |\def\3{4}| |\def\4{5}| |\def\5{6}| then \\
\def\3{4}\def\4{5}\def\5{6}
|\3| yields {\bf\3}, \\
|\csname\3\endcsname| yields
{\bf\csname\3\endcsname}, and \\
|\csname\csname\3\endcsname\endcsname| yields
{\bf\csname\csname\3\endcsname\endcsname}.
Similarly, when we have\\
\cs{k3} |\def\3{name}| |\def\name{action}| then \\
\def\3{name}\def\name{action}\k=3{}
|\the\k| yields {\bf\the\k}, \\
|\csname\the\k\endcsname| yields {\bf\csname\the\k\endcsname}, and\\
|\csname\csname\the\k\endcsname\endcsname| yields
{\bf\csname\csname\the\k\endcsname\endcsname}.\ftn{Confusing, but powerful.}
To exercise shortcut notation the last can be denoted by
|\value{\value{\the\k}}|.
Another \cs{csname...} will execute \cs{action}, which can be whatever
you provided as replacement text.\ftn
{My other uses of the \cs{csname} construction are:
to let \TeX{} accept an outer defined macro name in a replacement text,
to check whether a name has already been defined, and
to mimic a switch selector.}
\bluehead Bubble sort
This process looks repeatedly for the biggest element which is swapped
to the end. This is done for the complete array, the array of size $n-1$ et cetera.
The pseudo code reads as follows.
\beginpascal
for n:= upb downto 2 do
begin for k:= n-1 downto 1 do
if a[n].
%Result: \1<=\2<=...<=\.
{\loop\ifnum1<\n{\k\n
\loop\ifnum1<\k \advance\k-1
\cmp{\deref\k}{\deref\n}%
\ifnum\status=1 \xch\k\n\fi
\repeat}\advance\n-1
\repeat}}%end \bubblesort
%with auxiliaries
\def\deref#1{\csname\the#1\endcsname}
\let\cmp\cmpn %from blue.tex or provide
%\def\cmp#1#2{%Comparison. Yields
% \status=0, 1, 2 for =, >, <
%{...}
%\def\xch#1#2{%exchange
%#1, #2 counter variables
%{...}
!endverbatim
\bluehead Heap sort
We can organize the array as a heap. A heap is an ordered tree.
Loosely speaking for each node the siblings are smaller or
equal than the node.
The process consists of two main steps
\item- creation of a heap
\item- sorting the heap
\smallbreak
with a sift operation to be used in both.
In comparison with my earlier release of the code in \maps{92}2,
I adapted the notation with respect to sorting in {\em non-decreasing\/}
order.\ftn
{It is true that the reverse of the comparison operation would
do, but it seemed more consistent to me to adapt
the notation of the heap concept with
the smallest elements at the bottom.}
What is a heap?
A sequence $a_1, a_2, \dots, a_n$, is a heap if
$a_k\ge a_{2k} \wedge a_k\ge a_{2k+1}, k=1, 2, \dots, n\div2$, and
because $a_{n+1}$ is undefined, the notation is simplified by
defining $a_k>a_{n+1}, k= 1, 2, \dots , n$.
\\
A tree and one of its heap representations of $2, 6, 7, 1, 3, 4$
read
$$\thisbintree{\tophns10ex}
\beginbintree{00}2{11}6{12}7{21}1{22}3{23}4
2\endbintree
\kern-4ex\raise13ex\hbox{$\buildrel heap\over \longrightarrow$}
\thisbintree{\tophns10ex}\kern-4ex
\beginbintree{00}7{11}6{12}4{21}3{22}2{23}1
2\endbintree$$
In PASCAL-like notation the algoritm,
for sorting the array a[1:n], reads
{\parindent0pt
\beginpascal
(*heap creation*)
l := n div 2 + 1;
while l <> 1 do
begin l := l-1; sift(a, l, n) end;
(*sorting*)
r := n;
while r <> 1 do
begin swap(a[1], a[r]);
r := r-1; sift(a, 1, r)
end;
(*sift arg1 through arg2*)
j:= arg1;
while 2j >= arg2 and
(a[j] < a[2j] or a[j] < a[2j+1])
do begin mi := 2j + if a[2j] > a[2j+1]
then 0 else 1;
exchange(a[j], a[mi]); j := mi
end;
\endpascal
\smallskip}
\bluesubhead Purpose
Sorting values given in an array.
\bluesubhead Input
The values are stored in the control sequences
\cs{1}, \dots, |\|.
The counter |\n| must contain the value $\langle n\rangle$.
The parameter for comparison, \cs{cmp},
must be \cs{let}-equal to
\item- \cs{cmpn}, for numerical comparison,
\item- \cs{cmpw}, for word comparison,
\item- \cs{cmpaw}, for word comparison obeying the ASCII ordering, or
\item- a comparison macro of your own.
\smallbreak
\bluesubhead Output
The sorted array \cs{1}, \cs{2}, \dots |\|,
with \\
\cs{value1} $\le$ \cs{value2} $\le$
\dots $\le$ \cs{value}$\langle n\rangle$.
\bluesubhead Source
\thisverbatim{\unmc}
\beginverbatim
%Non-descending sorting
\def\heapsort{%data in \1 to \n
\r\n\heap\ic1
{\loop\ifnum1<\r\xch\ic\r
\advance\r-1 \sift\ic\r
\repeat}}
%
\def\heap{%Transform \1..\n into heap
\lc\n\divide\lc2{}\advance\lc1
{\loop\ifnum1<\lc\advance\lc-1
\sift\lc\n\repeat}}
%
\def\sift#1#2{%#1, #2 counter variables
\jj#1\uone#2\advance\uone1 \goontrue
{\loop\jc\jj \advance\jj\jj
\ifnum\jj<\uone
\jjone\jj \advance\jjone1
\ifnum\jj<#2 \cmpval\jj\jjone
\ifnum2=\status\jj\jjone\fi\fi
\cmpval\jc\jj\ifnum2>\status\goonfalse\fi
\else\goonfalse\fi
\ifgoon\xch\jc\jj\repeat}}
%
\def\cmpval#1#2{%#1, #2 counter variables
%Result: \status= 0, 1, 2 if
%values pointed by
% #1 =, >, < #2
\ea\let\ea\aone\csname\the#1\endcsname
\ea\let\ea\atwo\csname\the#2\endcsname
\cmp\aone\atwo}
%
\def\cmpn#1#2{%#1, #2 must expand into
%numbers
%Result: \status= 0, 1, 2 if
% \val{#1} =, >, < \val{#2}.
\ifnum#1=#2\global\status0 \else
\ifnum#1>#2\global\status1 \else
\global\status2 \fi\fi}
%
\def\xch#1#2{%#1, #2 counter variables
\edef\aux{\csname\the#1\endcsname}\ea
\xdef\csname\the#1\endcsname{\csname
\the#2\endcsname}\ea
\xdef\csname\the#2\endcsname{\aux}}.
%with auxiliaries
\newcount\n\newcount\lc\newcount\r
\newcount\ic\newcount\uone
\newcount\jc\newcount\jj\newcount\jjone
\newif\ifgoon
!endverbatim
Explanation.
\item{}\cs{heapsort}
The values given in \cs{1},\dots|\|,
are sorted in non-descending order.
\item{}\cs{heap}
The values given in \cs{1},\dots|\|,
are rearranged into a heap.
\item{}\cs{sift}
The first element denoted by the first (counter) argument
has disturbed the heap. Sift rearranges
the part of the array denoted by its two arguments, such that the
heap property holds again.
\item{}\cs{cmpval}
The values denoted by the counter values,
supplied as arguments, are compared.
\smallbreak
\blueexample Numbers, words
\cs{cmpn}, and \cs{cmpw} stand for compare numbers and words.
\cs{prtn}, and \cs{prtw} stand for print numbers and words, and
work the way you expect.
\cs{accdef} takes care that accents are properly defined.
\beginverbatim
\def\1{314}\def\2{1}\def\3{27}\n3
\let\cmp\cmpn\heapsort
\beginquote\prtn,\endquote
%
\def\1{ab}\def\2{c}\def\3{aa}\n3
\let\cmp\cmpaw\heapsort
\beginquote\prtw,\endquote
and
\def\1{j\ij}\def\2{ge\"urm}\def\3{gar\c con}
\def\4{\'el\`eve}\n4
\let\cmp\cmpw {\accdef\heapsort}
\beginquote\prtw\endquote
!endverbatim
yields
\def\1{314}\def\2{1}\def\3{27}\n=3
{\let\cmp\cmpn\heapsort
\beginquote\prtn,\endquote
%
\def\1{ab}\def\2{c}\def\3{aa}\n=3
\let\cmp\cmpaw\heapsort
\beginquote\prtw,\endquote
and
\def\1{j\ij}\def\2{ge\"urm}\def\3{gar\c con}\def\4{\'el\`eve}\n=4
\let\cmp=\cmpw{\accdef\heapsort}
\beginquote\prtw.\endquote
}
\bluehead Quick sort
The quick sort algorithm has been discussed in many places,
The following code is borrowed from Bentley.\ftn{Programming Pearls, Addison-Wesley.
It contains also diagrams which keep track of the invariants.}
\beginpascal
procedure QSort(low,up);
if low=T*)
for I:=low+1 to up do
if X[I]|, \dots, |\|.
\bluesubhead Input
The values are stored in
|\|, \dots, |\|,
with $1\le low\le up\le n$.
The parameter for comparison, \cs{cmp},
must be \cs{let}-equal to
\item- \cs{cmpn}, for number comparison,
\item- \cs{cmpw}, for word comparison,
\item- \cs{cmpaw},for word comparison obeying the ASCII ordering, or
\item- a comparison macro of your own.
\smallbreak
\bluesubhead Output
The sorted array |\|, \dots, |\|, with \\
\cs{va}$\langle low\rangle \le
\dots \le{}$ \cs{val}$\langle up\rangle$.
\bluesubhead Source
\thisverbatim{\unmc}
\beginverbatim
\def\quicksort{%Values given in
%\low,...,\up are sorted, non-descending.
%Parameters: \cmp, comparison.
\ifnum\low<\up\else\brk\fi
%\refval, a reference value selected
%at random.
\m\up\advance\m-\low%Size-1 of array part
\ifnum10<\m\rnd\multiply\m\rndval
\divide\m99 \advance\m\low \xch\low\m
\fi
\ea\let\ea\refval\csname\the\low\endcsname
\m\low\k\low\let\refvalcop\refval
{\loop\ifnum\k<\up\advance\k1
\ea\let\ea\oneqs\csname\the\k\endcsname
\cmp\refval\oneqs\ifnum1=\status
\global\advance\m1 \xch\m\k\fi
\let\refval\refvalcop
\repeat}\xch\low\m
{\up\m\advance\up-1 \quicksort}%
\low\m\advance\low1 \quicksort}
%
\def\brk#1\quicksort{\fi}
!endverbatim
Explanation.
At each level the array is partitioned into two parts.
After partitioning
the left part contains values less than the reference value and the
right part contains values greater than or equal to the reference value.
Each part is again partitioned via a recursive call of the macro.
The array is sorted when all parts are partitioned.
In the \TeX{} coding
the reference value as estimate for the mean value is determined
via a random selection of one of the elements.\ftn
{If the array is big enough. I chose rather arbitrarily \on10{}
as threshold.}
Reid's \cs{rnd} has been used.
The random number is mapped into
the range [$\,low:up\,$], via the linear transformation
$\hbox{\cs{low}}+(\hbox{\cs{up}}-\hbox{\cs{low}})*
\hbox{\cs{rndval}}/99$.\ftn
{Note that the number is guaranteed within the range.}
The termination of the recursion is coded in a \TeX{} peculiar way.
First, I coded the infinite loop.
Then I inserted the condition for termination with the \cs{fi}
on the same line, and not enclosing the main part of the macro.
On termination the invocation \cs{brk} gobbles up all the tokens
at that level to the end, to its separator \cs{quicksort},
and inserts its replacement text, a new \cs{fi},
to compensate for the gobbled \cs{fi}.
\bluesubhead Auxiliaries
Sorting is parameterized by comparison and exchanging.
Also needed is a random number generator.
The latter is not supplied here.
\thisverbatim{\unmc}
\beginverbatim
\def\cmpn#1#2{%#1, #2 must expand into
%numbers
%Result: \status= 0, 1, 2 if
% \val{#1} =, >, < \val{#2}.
\ifnum#1=#2\global\status0 \else
\ifnum#1>#2\global\status1 \else
\global\status2 \fi\fi}
%
\def\xch#1#2{%#1, #2 counter variables
\edef\aux{\csname\the#1\endcsname}\ea
\xdef\csname\the#1\endcsname{\csname
\the#2\endcsname}\ea
\xdef\csname\the#2\endcsname{\aux}}
!endverbatim
\bluesubhead Ordering
The ordering is parameterized in the ordering table.
\blueexample Numbers, words
\cs{cmpn}, and \cs{cmpw} stand for compare numbers and words.
\cs{prtn}, and \cs{prtw} stand for print numbers and words, and
work the way you expect.
\cs{accdef} takes care that accents are properly defined.
\beginverbatim
\def\1{314}\def\2{1}\def\3{27}\n3
\low1\up\n\let\cmp\cmpn
\quicksort
\beginquote\prtn,\endquote
%
\def\1{ab}\def\2{c}\def\3{aa}
\def\4{\ij}\def\5{ik}\def\6{z}\def\7{a}\n7
\low1\up\n\let\cmp\cmpw
\quicksort
\beginquote\prtw,\endquote
and
\def\1{j\ij}\def\2{ge\"urm}\def\3{gar\c con}
\def\4{\'el\`eve}\n4
\low1\up\n\let\cmp\cmpw
{\accdef\quicksort}
\beginquote\prtw.\endquote
!endverbatim
yields
\def\1{314}\def\2{1}\def\3{27}\n3
{\low1\up\n\let\cmp\cmpn
\quicksort
\beginquote\prtn,\endquote
%
\def\1{ab}\def\2{c}\def\3{aa}
\def\4{\ij}\def\5{ik}\def\6{z}\def\7{a}\n7
\low1\up\n\let\cmp\cmpw
\quicksort
\beginquote\prtw,\endquote
and
\def\1{j\ij}\def\2{ge\"urm}\def\3{gar\c con}
\def\4{\'el\`eve}\n=4
\low=1\up=\n\let\cmp=\cmpw
{\accdef\quicksort}
\beginquote\prtw.\endquote
}
\bluehead Use
I needed sorting within \TeX{} for indexing and
for sorting address labels.
\bluesubhead Sorting address labels
Suppose we wish to sort addresses on the secondary key membership number.
In order to do so the index must point to the name of the database entry
and the name must point to its membership number, that is
$$\vbox{\hbox{$1\,2\,\ldots
\rightarrow$ |\|${}_x\,$ |\|$_y\,\ldots
\rightarrow$ ||${}_x\,$ ||$_y\,\ldots$\hss}}
$$
This can be coded as follows.
\beginverbatim
\loadindexmacros
%
\def\lst#1#2{\advance\k1
\ea\def\csname\the\k\endcsname{#1}%
\ea\def\ea#1\gobbletono#2}
\def\gobbletono#1\no{}
\k0
\input toy.dat %The test database
\n\k %number of items
Membershipno unsorted: \1, \2, ...
%
\let\cmp\cmpn\sort
Sorted on membershipno: \1, \2, ...
!endverbatim
The amazing thing is that we don't have to do much extra because the name
will expand to the number, which will be used in the comparison.
I used that \cs{no} was the last element of the database entry,
but that is not essential.
Each database entry consist of a triple \cs{lst}, |\|,
and entry proper within braces.
\bluesubsubhead Typesetting
Now we have to redirect the pointer from the name away from the number
to the complete entry, that is
$$\vbox{\hbox{$1\,2\,\ldots
\rightarrow$|\|$_1\,$|\|$_2\,\dots
\rightarrow$|{entry}|$_1\,$|{entry}|$_2\,\ldots$\hss}}
$$
This is done as follows.
\beginverbatim
\def\lst#1#2{\def#1{#2}}
\input toy.dat
\1 \2 \3 \4 \5 \6
!endverbatim
\bluesubhead Sorting index entries
One of the processes in preparing an index is sorting the Index Reminders, IRs.
This is again a sorting process on secondary keys, even tertiary keys.
Given the sorting macros we just have to code
the special comparison macro in compliance with \cs{cmpw}:
compare two `values' specified by \cs{def}s.
Let us call this macro \cs{cmpir}.\ftn{Mnemonics: compare index reminders}
Each value is composed of
\item- a word (action: word comparison)
\item- a digit (action: number comparison), and
\item- a page number (action: (page) number comparison).
\smallbreak
The macros read as follows.
\thisverbatim{\unmc\catcode`!=12 \catcode`*=0 }
\beginverbatim
\def\cmpir#1#2{%#1, #2 defs
%Result: \status= 0, 1, 2 if
% \val{#1} =, >, < \val{#2}
\ea\ea\ea\decom\ea#1\ea;#2.}
%
\def\decom#1 !#2 #3;#4 !#5 #6.{%
\def\one{#1}\def\four{#4}\cmpaw\one\four
\ifnum0=\status%Compare second key
\ifnum#2<#5\global\status2 \else
\ifnum#2>#5\global\status1 \else
%Compare third key
\ifnum#3<#6\global\status2
\else\ifnum#3>#6\global\status1 \fi
\fi
\fi
\fi
\fi}
*endverbatim
Explanation.
I needed a two-level approach. The values are decomposed
into their components by providing them as arguments to \cs{decom}.\ftn
{Mnemonics: decompose. In each comparison the defs
are `dereferenced,' that is their replacement texts are
passed over. This is a standard \TeX nique: a triad of
\cs{ea}s, and the hop-overs to the second argument.}
The macro picks up the components
\item- the primary keys, the $\langle word\rangle$
\item- the secondary keys, the $\langle digit\rangle$, and
\item- the tertiary keys, the $\langle page\,number\rangle$.
\smallbreak
It compares the primary keys, and if necessary
successively the secondary and the tertiary keys.
The word comparison is done via the already available macro \cs{cmpaw}.
To let this work with \cs{sort}, we have to
\cs{let}-equal the \cs{cmp} parameter to \cs{cmpir}.
\bluehead Sorting in the mouth
Alan Jeffrey and Bernd Raichle have provided macros for this.
The following variant of the linear sorting given at the beginning of this
note is inspired upon Bernd's `Quick Sort in the Mouth,' Euro\TeX~\on94.
The idea is that a sequence is split in its smallest element and the rest
by an invoke of \cs{fifo}.
The rest is treated recursively as a similar sequence.
Another example of (multiple) nested FIFO.
\thisverbatim{\unmc}
\beginverbatim
\def\fifo#1%accumulated rest
#2%smallest
#3%next
{\ifx\ofif#3 #2\ofif{#1}\fi
\ifnum#3<#2
\p{\fifo{#1{#2}}{#3}}\else
\q{\fifo{#1{#3}}{#2}}\fi}
%repeat or terminate
\def\ofif#1\fi#2\fi{\fi
\if*#1*\endsort\fi
\fifo{}#1\ofif}
%auxiliaries
\def\p#1\else#2\fi{\fi#1}
\def\q#1\fi{\fi#1}
%terminator
\def\endsort#1\ofif{\fi}
%test
\fifo{}3{123}8{1943}\ofif
!endverbatim
To assure yourself that it is all done in the mouth \cs{write} the test.\ftn
{I don't know how to ensure correctness.
It is tricky to get the braces right.
I used \cs{tracingmacros=1}.}
However, in sorting within \TeX{} I prefer a uniform approach
not in the least parameterized over the ordering table.
Have fun, and all the best
\makesignature
\pasteuptoc
\endscript