Generated: September 11, 2001, 10:25:23Copyright © 2000, Kurt NørmarkThe local LAML software home page

Reference Manual of the XML parser and pretty printer for LAML

Kurt Nørmark ©    normark@cs.auc.dk    Department of Computer Science    Aalborg University    Denmark    

Source file: tools/xml-html-support/xml-support.scm

This is a simple, non-validating XML parser for LAML together with XML pretty printing support. As of the current version, the parser is not complete. Nevertheless, it is useful tool for parsing most everyday XML documents to a Lisp data structure.

Given a well-formed XML document this parser returns a Lisp tree structure that represents the parse tree of the XML document. The parser handles start tags, end tags, and empty tags (in this parser called start-end tags). Entities and their declarations are not handled at all.

The top level functions are xml-parse and xml-parse-file. The xml-parser can be loaded as a library as well.

There exists elucidative documentation of this parser. See also the HTML parsing and pretty printing support, which is built on top of the XML tools, and the illustrative examples of the XML parser and pretty printer.

This tool assumes that laml.scm and the general library are loaded. The tool loads the collect-skip and the file-read libraries.

The typographical rebreaking and re-indenting of running text is still missing.

The LAML interactive tool procedures xml-pp and xml-parse in laml.scm are convenient top-level pretty printing and parse procedures respectively.

Please notice that this is not a production quality parser and pretty printer! It is currently used for internal purposes.

Table of Contents:
1. The format of the parse tree.3. Utility parser functions.5. Variables that control the pretty printing.
2. Top level parser functions.4. Top level XML pretty printing functions.

Alphabetic index:
aggregate-final-parse-tree(aggregate-final-parse-tree kind)Aggregate remaining stack entries as subtrees of a kind node.
collect-attributes-in-tree(collect-attributes-in-tree tree attr-key)Traverse the parse tree, tree, and return the list all attribute values of the attribute attr-key found in the tree.
indentation-deltaindentation-deltaAn integer which gives the level of indentation
is-tag-of-kind?(is-tag-of-kind? tag-kind)Retun a predicate which tests whether a subtree or node is of tag-kind (a symbol or string).
parse-tree-to-laml(parse-tree-to-laml tree output-file)Transform an XML or HTML parse tree to a similar LAML expression on output-file.
parse-tree-to-laml-expression(parse-tree-to-laml-expression tree)Transform an XML or HTML parse tree to a LAML expressions (in terms of a Scheme list structure).
parse-xml(parse-xml file-path)This function parses a file and returns the parse tree.
parse-xml-file(parse-xml-file in-file-path out-file-path)Top level parse function which takes an XML file name as input, and delivers a parse tree on out-file-path.
parser-status(parser-status)Display parser status in case of error in the parse process.
prefered-maximum-widthprefered-maximum-widthAn integer that expresses the preferred maximum column width
pretty-print-xml-parse-tree(pretty-print-xml-parse-tree parse-tree)Pretty prints a HTML parse tree, and return the result as a string.
pretty-print-xml-parse-tree-file(pretty-print-xml-parse-tree-file in-file-path [out-file-path])Pretty prints the XML parse tree (Lisp file) in in-file-path.
resulting-parse-treeresulting-parse-treeA global varible holding the latest produced parse tree
traverse-and-collect-from-parse-tree(traverse-and-collect-from-parse-tree tree node-interesting? result-transformer)Traverse the parse tree, tree, and return a list of result-transformed nodes that satisfy the node-interesting? predicate in the parse tree.
use-single-lininguse-single-liningA boolean which controls the application of single line pretty printing.

 

1.   THE FORMAT OF THE PARSE TREE.
A parse tree T produced by this tool is of the form

    (tree N ST1 ST2 ... STn) 
where STi, i=1..n are parse trees (recursively) and N is a node (see below).

A leaf node N may be of the form

    (tree N) 
or just N if N is a string (corresponding to textual contents) or an empty tag (a tag without contents).

An inner node of a parse tree corresponds to a tag (an element) with contents. Such a node is represented by the following 'tag structure':

    (tag kind tag-name . attr-info) 
tag is a symbol (for tagging). kind is either start or start-end (both symbols). tag-name is a string. Attr-info is the attribute on property list format.

A terminal node may be a start-end node, a comment node or just a contents string. End tags are not represented in the parse tree.

Here is an example of a start-end node (empty node) with two properties:

    (tag start-end "title" role "xxx" size "5") 
Comments are represented as comment nodes of the form
    (comment comment-string) 

Declaration nodes of the form

    (declaration kind value) 
are also possible. They are for instance used for document type (???) information in HTML. Finally nodes of the form
    (xml-declaration attribute-property-list) 
are supported.


 

2.   TOP LEVEL PARSER FUNCTIONS.



parse-xml-file



Form
(parse-xml-file in-file-path out-file-path)

Description
Top level parse function which takes an XML file name as input, and delivers a parse tree on out-file-path. file-path is a file path (relative or absolute) with or without an extension. The default extension is xml. The parse tree is written on the file out-file-path.


parse-xml



Form
(parse-xml file-path)

Description
This function parses a file and returns the parse tree. file-path is a file path (relative or absolute) without any extension.


resulting-parse-tree



Form
resulting-parse-tree

Description
A global varible holding the latest produced parse tree


aggregate-final-parse-tree



Form
(aggregate-final-parse-tree kind)

Description
Aggregate remaining stack entries as subtrees of a kind node. kind is a symbol, such as html-tree or xml-tree


 

3.   UTILITY PARSER FUNCTIONS.
The functions in this section are all miscelaneous and utility functions of the parser.


traverse-and-collect-from-parse-tree



Form
(traverse-and-collect-from-parse-tree tree node-interesting? result-transformer)

Description
Traverse the parse tree, tree, and return a list of result-transformed nodes that satisfy the node-interesting? predicate in the parse tree. In other words, apply the node-interesting? predicate to all subtrees of the tree during the traversal, and return the result-transformed list of subtrees. Both the functions node-interesting? and result-transformer are applied on trees and subtrees.

Example
 (traverse-and-collect-from-parse-tree resulting-parse-tree (is-tag-of-kind? 'a) parse-tree-to-laml-expression) 


collect-attributes-in-tree



Form
(collect-attributes-in-tree tree attr-key)

Description
Traverse the parse tree, tree, and return the list all attribute values of the attribute attr-key found in the tree.

Example
(collect-attributes-in-tree tree 'href) 


is-tag-of-kind?



Form
(is-tag-of-kind? tag-kind)

Description
Retun a predicate which tests whether a subtree or node is of tag-kind (a symbol or string). This function is a useful second parameter to traverse-and-collect-from-parse-tree.


parser-status



Form
(parser-status)

Description
Display parser status in case of error in the parse process.


parse-tree-to-laml



Form
(parse-tree-to-laml tree output-file)

Description
Transform an XML or HTML parse tree to a similar LAML expression on output-file. When processed, the LAML file will write the a LAML file, say f.laml, to f.html in the same directory as the laml file.


parse-tree-to-laml-expression



Form
(parse-tree-to-laml-expression tree)

Description
Transform an XML or HTML parse tree to a LAML expressions (in terms of a Scheme list structure). This function is similar to parse-tree-to-laml which delivers a textual result (a string) on an output file


 

4.   TOP LEVEL XML PRETTY PRINTING FUNCTIONS.



pretty-print-xml-parse-tree-file



Form
(pretty-print-xml-parse-tree-file in-file-path [out-file-path])

Description
Pretty prints the XML parse tree (Lisp file) in in-file-path. Outputs the pretty printed result in out-file-path.


pretty-print-xml-parse-tree



Form
(pretty-print-xml-parse-tree parse-tree)

Description
Pretty prints a HTML parse tree, and return the result as a string.


 

5.   VARIABLES THAT CONTROL THE PRETTY PRINTING.
These variables apply for both HTML and XML.


indentation-delta



Form
indentation-delta

Description
An integer which gives the level of indentation


use-single-lining



Form
use-single-lining

Description
A boolean which controls the application of single line pretty printing. If true, the pretty printer will pretty print short list forms on a single line


prefered-maximum-width



Form
prefered-maximum-width

Description
An integer that expresses the preferred maximum column width


Generated: September 11, 2001, 10:25:23
This documentation has been extracted automatically from the Scheme source file by means of the Schemedoc tool