Generated: September 15, 2004, 10:52:45 | Copyright © 2004 , Kurt Nørmark | ![]() |
The top-level node is called a html-tree, which may hold top level comment nodes and declaration nodes (docttype nodes). The parser represents HTML comments within the document as special comment nodes.
The parser will be very confused if it meets a less than or greater than character which isn't part of tag symbol. Such character must be HTML protected (use the special character entities in HTML).
This tool assumes that laml.scm and the general library are loaded. The tool loads xml-support (which is the starting of this html support tool), collect-skip and file-read libraries.
See the XML support for information about the format of parse trees and variables that control the pretty printing. See also the illustrative examples of the HTML parsing and pretty printing tools.
The typographical rebreaking and re-indenting of running text is still missing.
The LAML interactive tool procedures html-pp and html-parse in laml.scm are convenient top-level pretty printing and parse procedures respectively.
Known problem: The handling of spaces after the start tag and before the end tag is not correct.
Please notice that this is not a production quality parser and pretty printer! It is currently used for internal purposes.
parse-html | (parse-html file-path) | This function parses a file and return the parse tree. |
parse-html-file | (parse-html-file in-file-path out-file-path) | Parse the file in in-file-path, and deliver the parse tree in out-file-path. |
parse-html-string | (parse-html-string str) | Parse the string str which is supposed to contain a HTML document. |
pretty-print-html-parse-tree | (pretty-print-html-parse-tree parse-tree) | Pretty prints a HTML parse tree, and return the result as a string. |
pretty-print-html-parse-tree-file | (pretty-print-html-parse-tree-file in-file-path [out-file-path]) | Pretty prints the HTML parse tree (lisp file) in in-file-path. |