XML mirrors in Scheme: XML in LAML

Kurt Nørmark ©    normark@cs.auc.dk    Department of Computer Science, Aalborg University    

Abstract.

This is the chapter in which you learn how to make a mirror of an XML language, which is formally defined via a Document Type Definition (DTD).

The LAML DTD parser and the XML-in-LAML mirror generation stuff is made available in the LAML distribution from version 19.00.

 

  The DTD intro
1  Introduction

The starting point of this part of the tutorial is a standard XML Document Type Defintion - also known as a DTD. We will see how to parse it using the LAML DTD parser. After that we will generate a fully validating Scheme mirror of the DTD, and we will see how to make use of the XML language in Scheme and LAML.

1.1  Overview
 

Introduction   overview
1.1  Overview

The example we will study is written in a very simple XML language for description of a number of bikes.

We define the grammar (DTD) of the language in 2.1, parse it in 2.2, and we make a mirror of it in Scheme (see 3.1). The major efforts is here to define a validation predicate for one of the elements in the language, see 3.2.

In 4.1 we illustrate how to make a simple transformation of a bike document to XHTML

Finally, in 5.1 we study a couple of bike documents and their transformations to XHTML.

All the examples source files are available in the tutorial/xml-in-laml directory of the LAML examples.

 

 

 Introduction Mirror synthesis dtd-sec
2  The DTD

We start by defining and parsing the DTD

2.1  The DTD
2.2  Parsing the DTD
 

The DTD  Parsing the DTD dtd
2.1  The DTD

The bikes document type defintion (DTD) is the first to be constructed. Indeed, it is very simple; We have written it with demo purposes for this tutorial. The DTD is here:

<!ENTITY % Number "CDATA">
    <!-- one or more digits -->

<!ENTITY % Boolean "(true | false)">
    <!--  spaces -->



<!ELEMENT bikes (bike)*>
<!ATTLIST bikes
>

<!ELEMENT bike (frame, wheel+, brake*, lock*)>
<!ATTLIST bike
  kind   (mountain-bike, racer-bike, tourist-bike, other)  "tourist-bike"
>

<!ELEMENT frame EMPTY>

<!ATTLIST frame
  frame-number CDATA #REQUIRED
>

<!ELEMENT wheel EMPTY>

<!ATTLIST wheel
  size        %Number; #REQUIRED
  tube-kind   CDATA    #IMPLIED
>

<!ELEMENT brake EMPTY>

<!ATTLIST brake
  kind    CDATA   #IMPLIED
  brand   CDATA   #IMPLIED
>

<!ELEMENT lock EMPTY>

<!ATTLIST lock
  brand   CDATA   #IMPLIED
  insurance-approved  %Boolean; #REQUIRED
>

If you wish, you can bring the DTD up in the right frame of this elucidator by clicking here.

Let us briefly explain what the DTD means. - The first two clauses are entities. Basically, you can think of them as textual macros named 'Number' and 'Boolean'. DTDs are weak with respect to pre-defined types. Therefore it is common to introduce some ad hoc types in the way we have done it in bikes.dtd. Entities should, in addition, be used whenever you need to use the same fragment of text more than once.

Next we encounter the elements and the attributes. We always recommend to have explicite attribute definitions of each element. In fact, this is necessary for the mirror generator to work properly.

The elements define a conventional context free grammar. We see that bikes consist of zero, one, or more bike clauses. In turn, a single bike clause is an aggregation of a frame, one or more wheels, zero or more brakes, and zero or more locks. Both frame, wheel, brake, and lock are terminal concepts. In DTD parlance, they are EMPTY elements. All of them have a number of attributes, however.

 

 

The DTD The DTD  dtd-parsing
2.2  Parsing the DTD

It must be stressed that the LAML DTD parser is an ad hoc parser, which does not recognize all aspects of a DTD. However, it is good enough to handle all the XHTML DTDs (strict, transitional, frameset), the SVG DTD, as well as the DTD we have made in the scope of the LAML project. Early versions of the parser (which are not part of the LAML distributions) have also been used to parse the HTML4.01 DTD (which is a non-XML DTD).

In order to use the information in the DTD it is necessary to parse it. I.e., the natural structure of the DTD needs to be revealed and represented in some kind of hierarchical structure.

The central parsing command is parse-dtd.This procedure can be called from a Scheme prompt if the 'appropriate software' is loaded. We find it easier to make a little script that first loads the software and next calls parse-dtd. The parsing script is parsing-script. After loading laml.scm and the DTD parser from the tools/dtd-parser directory of the LAML distribution, the parse-dtd function is called with the proper name of the dtd file as input.

As the result, the DTD parser writes a file bikes.lsp - parsed-bikes-dtd-raw - which contains the parsed DTD data structure. No kind of pretty printing is performed, so at the first glance the result may be difficult to grasp. To make it a little easier to undertand we provide a somewhat pretty printed version of bikes.lsp in parsed-bikes-dtd (made by using the LAML Scheme pretty printer scheme-pp on the file). One of the main things to notice is the contents models in the element clauses. This is element number five in each of the element forms. In case this is a string, the content model is 'rather complex'. In case the content model is a list, the string "EMPTY", or the symbol "pcdata-checker", the mirror synthesis software will be able to automaticaly make a validation predicate of the element.

When you run the parser it reports about the number of elements for which it was able to parse the content elements. It writes something like:

 ...
 Elements in total: 6
 Non-empty elements: 2
 Elements with non-parsed element content models: 1
 The elements with non-parsed element content models are: bike
 ...

together with a lot of 'progress information'. These lines are important information when we are generating the mirror - see section 3.

You should read section 1 of dtd-parser.scm for details about the format of the parsed DTD file.

 

 

 The DTD Transformation of the mirror AST mirror-sec
3  Mirror synthesis

In this section we will describe how to make the mirror of the bikes language in Scheme.

3.1  Making the mirror
3.2  The bike validation predicate
3.3  The resulting mirror
 

Mirror synthesis  The <kbd>bike</kbd> validation predicate mirror-making
3.1  Making the mirror

Given the parsed DTD from above it is in principle easy to synthesize a mirror of XML language in Scheme. Stated in simple terms, we go for a one-to-one mapping between elements of the DTD and functions in Scheme. Thus, for each element in the XML language there will be a mirror function in Scheme.

As we did for parsing, we also make a simple script which activates the mirror synthesizer. This is mirroring-script.

We explain it briefly. After initial loading of laml.scm and the xml-in-laml tool software in xml-in-laml.scm we set the tools parameters in the section of the script called tool-parameters. All these parameters are described in section 1 of xml-in-laml.scm. We give the name of the mirror (mirror-name), a full path to the parsed dtd (parsed-dtd-path), and the full path of the mirror target directory (mirror-target-dir). In addition we give a path to a file with manually written validation predicates (manually-programmed-validation-predicates) and a list of so-called action elements (action-elements); Each of these will be explained below.

The main mirror generation procedure is generate-mirror from xml-in-laml.scm.Please click on the name to learn about its formal parameters. See also the activation of generate-mirror and the actual parameters in the section of the script called tool-activation.

The action element bikes is the root element in a bikes document. See section 5.1 for an example of such a document. The consequence of announcing the bikes element as an action element is that a procedures bikes! will be called with the purpose of initiating some kind of transformation of the bike ast - see section 4.

From the parsing process described in 2.2 we know that we have to write a validation predicate for the element bike. The tools parameter manually-programmed-validation-predicates says that the file with this predicate is found in the file 'bikes-content-validation-predicates.scm'. We will now in 3.2 take a closer look at the predicate of the bike element, which is the only one LENO can not make automatically.

Notice also the default language properties, described in section 2 of xml-in-laml.scm.In case the default default values do not fit your needs, you can change the defaults before generate-mirror is called. Please read the introduction to section 2 of xml-in-laml.scm.The values of the varibales are strings, which are inserted in the synthesized mirror library. This aspect of the mirror synthesis is a little primitive, and we may able to improve it in a future version.

 

 

Mirror synthesis Making the mirror The resulting mirror validation
3.2  The bike validation predicate

The function bike-bike-management-checker? checks if a bike clause is valid relative to its immediate constituents.

The implementation of bike-bike-management-checker? is tedious, and we are clearly motivated to fully automate the validation process in LAML. For now, however, a few validation predicates must be written manually. The names of the predicates must follow a given naming scheme: elementName-languageName-'checker?'.

It is not really important that you go into the details of bike-bike-management-checker?. But we will now briefly explain it. Many other predicates will follow a similar pattern.

First recall that the content model of the bike element is

  (frame, wheel+, brake*, lock*)

The first two cases in the conditional of bike-bike-management-checker? checks if the contents of the bike instance is empty. If it is, we call the procedure xml-add-problem! from the xml-in-laml.scm common library.

The third, fourth, and fifth cases catch the situations where there are prefix problems with frame and the first wheel.

The last case in cond is the most interesting. It checks that the suffix of the bike clauses satisfies

  (wheel*, brake*, lock*)

We have written the predicate check-star-sequence! for that purpose. It makes use of the helping procedure check-star-sequence-1! which does the real work.

 

 

Mirror synthesis The <kbd>bike</kbd> validation predicate  mirror-making-sec
3.3  The resulting mirror

It is now time to take a look at the generated mirror. As many other auto-generated programs it is not really intended to be read. More important, you should never edit it!

Take a look at the mirror in bikes-mirror. There are three sections:

  1. The validation predicates starting with bikes-bike-management-laml-validate!.
  2. The mirror functions starting with bikes
  3. An inlining of the manually programmed validation predicates, see bike-bike-management-checker?

For convenience we usually make a LAML style function for easy loading of software which belongs to a given XML-in-LAML language. These styles are located in styles styles/xml-in-laml/ in the LAML distribution. The bikes.scm LAML style can be seen in bikes-style. Notice that the definition of the action procedure of bike, called bikes! is defined here. It must be defined before the generated mirror functions are loaded. In this file we also program the appropriate transformations of a bikes AST. This is the theme of section 4.

At this point in time it will be possible to play with bikes documents in Scheme/LAML syntax. If you are curious, you can already now jump to section 5.

 

 

 Mirror synthesis Mirror usage mirror-trans-sect
4  Transformation of the mirror AST

In this section we will study a simple transformation of a bikes AST.

4.1  Transformation of the mirror AST
 

Transformation of the mirror AST   mirror-trans
4.1  Transformation of the mirror AST

At this point we are able to construct a bike AST. In order to make it useful, we must somehow transform the AST - typically to an HTML page. We will here see a very simple example of a transformation which illustrates some useful LAML function for these purposes.

The bikes-1 document gives rise to an abstract syntax tree. Normally, we do not look at the internal list representation of such a tree. It may, however, be instructive to see what it looks like, see bikes-1-ast. The boolean #t constants are white space markers, and really of no relevance for the bikes language. (You may get rid of the white space markers once and for all via definition of default-xml-represent-white-space).

The XML-in-LAML library xml-in-laml.scm which is shared between all the XML-in-LAML languages, defines functions for constructing and accessing abstract syntax trees. They are located in section 4.

We program the transformation functions in the bikes style file bikes-style which is located in the styles/xml-in-laml/tutorials-and-demos/ directory of the LAML distribution. The transformation is initiated in the action procedure of the bikes element called bikes!. This procedure writes a HTML file via use of the write-html procedure from laml.scm.In the procedure bikes! we traverse the bikes AST in order to locate the bike subclauses (A link to a program source marker in bikes!). The traverse-and-collect-all-from-ast function applies the bike-table function on each bike clause. The resulting HTML document can be seen here.

The function bike-table accesses some of the constituents of a bike clause, and in in turn some of their attributes. The function returns a list of tr elements, which in bikes! conveniently can be passed to a modified (A link to a program source marker in bikes!) XHTML table mirror function.

Notice that bikes-style loads the XHTML transitional mirror, such that the resulting transformation returns an XHTML document. In case of XHTML validation problem, you will get warnings.

 

 

 Transformation of the mirror AST  mirror-use-sec
5  Mirror usage

In this section we will look at an example usage of the generated mirror functions.

5.1  Using the mirror
 

Mirror usage   mirror-use
5.1  Using the mirror

Let us look at a couple of bikes documents, see bikes-1 and bikes-2. In both we first load laml.scm and then the bikes style, which we discussed in section 3.3 and 4.

In bikes-1 we see a bikes clause with two bike clauses. The document is valid relative to the DTD. The generated HTML document is here.

In bikes-2 we also see a bikes clause with two bike clauses. This document is invalid, however. When we process it we get several validation errors, something like:

LAML Emacs Processing
Welcome to MzScheme version 202, Copyright (c) 1995-2002 PLT
Warning: XML Warning:  The XML attribute  tube-kind-attribute  is not valid in the wheel element.
Warning: XML Warning:  XML validation error(s) encountered in an instance of the  bike  element
  The bike element instance does not have  wheel* brake* lock*  as a suffix:   <frame frame-number = "IQ7W36-56"/> <lock brand = "ba...
Warning: XML Warning:  XML validation error(s) encountered in an instance of the  bike  element
  The first element of a  bike  element must be a frame element:   <lock brand = "basta" insurance-approved = "true"/> <frame frame-...

Process LAML finished

Therefore we cannot (and should not) transform it to an HTML document.