MANDOC(3) NetBSD Library Functions Manual MANDOC(3)

NAME

mandoc, man_meta, man_node, mdoc_meta, mdoc_node, mparse_alloc, mparse_free, mparse_readfd, mparse_reset, mparse_result, mparse_strerror, mparse_strlevelmandoc macro compiler library

SYNOPSIS

#include <man.h>
#include <mdoc.h>
#include <mandoc.h>

const struct man_meta *
man_meta(const struct man *man);

const struct man_node *
man_node(const struct man *man);

const struct mdoc_meta *
mdoc_meta(const struct mdoc *mdoc);

const struct mdoc_node *
mdoc_node(const struct mdoc *mdoc);

void
mparse_alloc(enum mparset type, enum mandoclevel wlevel, mandocmsg msg, void *msgarg);

void
mparse_free(struct mparse *parse);

enum mandoclevel
mparse_readfd(struct mparse *parse, int fd, const char *fname);

void
mparse_reset(struct mparse *parse);

void
mparse_result(struct mparse *parse, struct mdoc **mdoc, struct man **man);

const char *
mparse_strerror(enum mandocerr);

const char *
mparse_strlevel(enum mandoclevel);

extern const char * const * man_macronames;
extern const char * const * mdoc_argnames;
extern const char * const * mdoc_macronames;

DESCRIPTION

The mandoc library parses a UNIX manual into an abstract syntax tree (AST). UNIX manuals are composed of mdoc(7) or man(7), and may be mixed with roff(7), tbl(7), and eqn(7) invocations.

The following describes a general parse sequence:

  1. initiate a parsing sequence with mparse_alloc();
  2. parse files or file descriptors with mparse_readfd();
  3. retrieve a parsed syntax tree, if the parse was successful, with mparse_result();
  4. iterate over parse nodes with mdoc_node() or man_node();
  5. free all allocated memory with mparse_free(), or invoke mparse_reset() and parse new files.

IMPLEMENTATION NOTES

This section consists of structural documentation for mdoc(7) and man(7) syntax trees.

Man Abstract Syntax Tree

This AST is governed by the ontological rules dictated in man(7) and derives its terminology accordingly.

The AST is composed of struct man_node nodes with element, root and text types as declared by the type field. Each node also provides its parse point (the line, sec, and pos fields), its position in the tree (the parent, child, next and prev fields) and some type-specific data.

The tree itself is arranged according to the following normal form, where capitalised non-terminals represent nodes.

ROOT
← mnode+
mnode
← ELEMENT | TEXT | BLOCK
BLOCK
← HEAD BODY
HEAD
← mnode*
BODY
← mnode*
ELEMENT
← ELEMENT | TEXT*
TEXT
← [[:alpha:]]*

The only elements capable of nesting other elements are those with next-lint scope as documented in man(7).

Mdoc Abstract Syntax Tree

This AST is governed by the ontological rules dictated in mdoc(7) and derives its terminology accordingly. “In-line” elements described in mdoc(7) are described simply as “elements”.

The AST is composed of struct mdoc_node nodes with block, head, body, element, root and text types as declared by the type field. Each node also provides its parse point (the line, sec, and pos fields), its position in the tree (the parent, child, nchild, next and prev fields) and some type-specific data, in particular, for nodes generated from macros, the generating macro in the tok field.

The tree itself is arranged according to the following normal form, where capitalised non-terminals represent nodes.

ROOT
← mnode+
mnode
← BLOCK | ELEMENT | TEXT
BLOCK
← HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
ELEMENT
← TEXT*
HEAD
← mnode*
BODY
← mnode* [ENDBODY mnode*]
TAIL
← mnode*
TEXT
← [[:printable:],0x1e]*

Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of the BLOCK production: these refer to punctuation marks. Furthermore, although a TEXT node will generally have a non-zero-length string, in the specific case of ‘.Bd -literal', an empty line will produce a zero-length string. Multiple body parts are only found in invocations of ‘Bl -column', where a new body introduces a new phrase.

The mdoc(7) syntax tree accomodates for broken block structures as well. The ENDBODY node is available to end the formatting associated with a given block before the physical end of that block. It has a non-null end field, is of the BODY type, has the same tok as the BLOCK it is ending, and has a pending field pointing to that BLOCK's BODY node. It is an indirect child of that BODY node and has no children of its own.

An ENDBODY node is generated when a block ends while one of its child blocks is still open, like in the following example:

.Ao ao 
.Bo bo ac 
.Ac bc 
.Bc end

This example results in the following block structure:

BLOCK Ao 
    HEAD Ao 
    BODY Ao 
        TEXT ao 
        BLOCK Bo, pending -> Ao 
            HEAD Bo 
            BODY Bo 
                TEXT bo 
                TEXT ac 
                ENDBODY Ao, pending -> Ao 
                TEXT bc 
TEXT end

Here, the formatting of the ‘Ao' block extends from TEXT ao to TEXT ac, while the formatting of the ‘Bo' block extends from TEXT bo to TEXT bc. It renders as follows in -Tascii mode:

<ao [bo ac> bc] end

Support for badly-nested blocks is only provided for backward compatibility with some older mdoc(7) implementations. Using badly-nested blocks is strongly discouraged; for example, the -Thtml and -Txhtml front-ends to mandoc(1) are unable to render them in any meaningful way. Furthermore, behaviour when encountering badly-nested blocks is not consistent across troff implementations, especially when using multiple levels of badly-nested blocks.

SEE ALSO

mandoc(1), eqn(7), man(7), mdoc(7), roff(7), tbl(7)

AUTHORS

The mandoc library was written by Kristaps Dzonsons <kristaps@bsd.lv>.
March 28, 2011 NetBSD 5.99