Chapter with Epigraph
Representation is the essence of programming.
-- Fred Brooks
Some Text
The XML attribute tag doesn’t map to a simple HTML div tag, so the existing SMOP language didn’t work. But first we had to update the unit test. 15.17 Unit Test Maintenance To add the new case to the unit test, we copied the line containing the first test case, and changed the the filenames: #!perl -w use strict; use Bivio::Test; use Bivio::IO::File; Bivio::Test->new(’Bivio::XML::DocBook’)->unit([ ’Bivio::XML::DocBook’ => [ to_html => [ [’DocBook/01.xml’] => [Bivio::IO::File->read(’DocBook/01.html’)], [’DocBook/02.xml’] => [Bivio::IO::File->read(’DocBook/02.html’)], ], ], ]); Copyright c 2004 Robert Nagler All rights reserved nagler@extremeperl.org 157 Woops! We fell into the dreaded copy-and-paste trap. The new line is identical to the old except for two characters out of 65. That’s too much redundancy (97% fat and 3% meat). It’s hard to tell the difference between the two lines, and as we add more tests it will be even harder. This makes it easy to forget to add a test, or we might copy-and-paste a line and forget to change it. We factored out the common code to reduce redundancy: #!perl -w use strict; use Bivio::Test; use Bivio::IO::File; Bivio::Test->new(’Bivio::XML::DocBook’)->unit([ ’Bivio::XML::DocBook’ => [ to_html => [ map({ my($html) = $ ; $html =~ s/xml$/html/; [$ ] => [Bivio::IO::File->read($html)]; } sort(-- ’, suffix => ’
’, },
chapter => [’html’, ’body’], emphasis => [’b’], epigraph => [],
para => [’p’],
simplesect => [],
title => [’h1’],
Copyright c 2004 Robert Nagler
All rights reserved nagler@extremeperl.org
158
});
attribution maps to a hash that defines the prefix and suffix. For the
other tags, the prefix and suffix is computed from a simple name. We added
to html compile which is called once at initialization to convert the simple
tag mappings (arrays) into the more general prefix/suffix form (hashes) for
efficiency.
15.19
Second SMOP Interpreter
We extended to html node to handle asymmetric prefixes and suffixes. The
relevant bits of code are:
sub to html compile { my($config) = @ ; while (my($xml, $html)
= each(%$config)) { $config->{$xml} = { prefix => to html tags($html,
’’), suffix => to html tags([reverse(@$html)], ’/’), } if ref($html)
eq ’ARRAY’; } return $config; }
sub _to_html_node {
my($tag, $tree) = @_;
return HTML::Entities::encode($tree)
unless $tag;
die($tag, ’: unhandled tag’)
unless $_TO_HTML->{$tag};
# We ignore the attributes for now.
shift(@$tree);
return $ TO HTML->{$tag}->{prefix} . ${ to html($tree)} . $ TO HTML->{$tag}->{suffix};
}
to html compile makes to html node simpler and more efficient, because
it no longer calls to html tags with the ordered and reversed HTML tag
name lists. Well, I thought it was more efficient. After performance testing,
the version in Final Implementation turned out to be just as fast.13
The unnecessary compilation step adds complexity without improving
performance. We added it at my insistence. I remember saying to Alex,
13
Thanks to Greg Compestine for asking the questions: What are the alternatives, and
how do you know is faster?
Copyright c 2004 Robert Nagler
All rights reserved nagler@extremeperl.org
159
“We might as well add the compilation step now, since we’ll need it later
anyway.” Yikes! Bad programmer! Write “I’m not going to need it” one
hundred times in your PDA. Even in pairs, it’s hard to avoid the evils of
pre-optimization.
15.20
Spike Solutions
As long as I am playing true confessions, I might as well note that I implemented a spike solution to this problem before involving my programming
partners. A spike solution in XP is a prototype that you intend to throw
away. I wrote a spike to see how easy it was to translate DocBook to HTML.
Some of my partners knew about it, but none of them saw it.
The spike solution affected my judgement. It had a compilation step,
too. Programming alone led to the pre-optimization. I was too confident
that it was necessary when pairing with Alex.
Spike solutions are useful, despite my experience in this case. You use
them to shore up confidence in estimates and feasibility of a story. You write
a story card for the spike, which estimates the cost to research possibilities.
Spike solutions reduce risk through exploratory programming.
15.21
Third Task
The third task introduces contextually related XML tags. The DocBook
title tag is interpreted differently depending on its enclosing tag. The test
case input file (03.xml) is:
A quoted paragraph.
Chapter with Section Title
print(values(%{{1..8}}));
Some other tags: literal value, function_name, and command-name.
A quoted paragraph.
Statelessness Is Next to Godliness
A new section.
Copyright c 2004 Robert Nagler All rights reserved nagler@extremeperl.org 161 The chapter title translates to an HTML h1 tag. The section title translates to an h2 tag. We extended our SMOP language to handle these two contextually different renderings of title. 15.22 Third SMOP We discussed a number of ways to declare the contextual relationships in our SMOP. We could have added a parent attribute to the hashes (on the right) or nested title within a hash pointed to by the chapter tag. The syntax we settled on is similar to the one used by XSLT.14 The XML tag names can be prefixed with a parent tag name, for example, "chapter/title". The SMOP became: my($ XML TO HTML PROGRAM) = compile program({ attribution => { prefix => ’-- ’,
suffix => ’
’,
}, blockquote => [’blockquote’], ’chapter/title’ => [’h1’],
chapter => [’html’, ’body’], command => [’tt’],
emphasis => [’b’],
epigraph => [], function => [’tt’], literal => [’tt’],
para => [’p’], programlisting => [’blockquote’, ’pre’], sect1
=> [], ’sect1/title’ => [’h2’],
simplesect => [],
});
15.23
Third SMOP Interpreter
We refactored the code a bit to encapsulate the contextual lookup in its own
subroutine:
sub to_html {
my($self, $xml_file) = @_;
14
The XML Stylesheet Language Translation is an XML programming language for
translating XML into XML and other output formats (e.g., PDF and HTML). For more
info, see http://www.w3.org/Style/XSL/
Copyright c 2004 Robert Nagler
All rights reserved nagler@extremeperl.org
162
return _to_html( ’’,
XML::Parser->new(Style => ’Tree’)->parsefile($xml_file));
}
sub eval child {
my($tag, $children, $parent tag) = @_;
return HTML::Entities::encode($children)
unless $tag;
# We ignore the attributes for now.
shift(@$children);
return eval op( lookup op($tag, $parent tag), to html($tag, $children));
}
sub eval op { my($op, $html) = @ ; return $op->{prefix} . $$html
. $op->{suffix}; } sub lookup op { my($tag, $parent tag) = @ ; return
$ XML TO HTML PROGRAM->{"$parent tag/$tag"} || $ XML TO HTML PROGRAM->{$tag}
|| die("$parent tag/$tag: unhandled tag"); }
sub _to_html {
my($tag, $children) = @_;
my($res) = ’’;
$res .= eval child(splice(@$children, 0, 2), $tag)
while @$children;
return \$res;
}
# Renamed compile program and compile tags to html not shown for
brevity.
The algorithmic change is centralized in lookup op, which wants a tag
and its parent to find the correct relation in the SMOP. Precedence is given
to contextually related tags ("$parent tag/$tag") over simple XML tags
($tag). Note that the root tag in to html is the empty string (’’). We
defined it to avoid complexity in the lower layers. lookup op need not be
specially coded to handle the empty parent tag case.
15.24
The Metaphor
This task implementation includes several name changes. Alex didn’t feel
the former names were descriptive enough, and they lacked coherency. To
Copyright c 2004 Robert Nagler
All rights reserved nagler@extremeperl.org
163
help think up good names, Alex suggested that our program was similar
to a compiler, because it translates a high-level language (DocBook) to a
low-level language (HTML).
We refactored the names to reflect this new metaphor. $ TO HML became
$ XML TO HTML PROGRAM, and to html compile to compile program. and
so on. An $op is the implementation of an operator, and lookup op parallels
a compiler’s symbol table lookup. eval child evokes a compiler’s recursive
descent algorithm.
The compiler metaphor helped guide our new name choices. In an
XP project, the metaphor subsitutes for an architectural overview document. Continuous design means that the architecture evolves with each
iteration, sometimes dramatically, but a project still needs to be coherent.
The metaphor brings consistency without straitjacketing the implementation. In my opinion, you don’t need a metaphor at the start of a project.
Too little is known about the code or the problem. As the code base grows,
the metaphor may present itself naturally as it did here.
15.25
Fourth Task
The fourth and final task introduces state to generate the HTML for DocBook footnotes. The test case input file (04.xml) is:
I do declare!
Chapter with Footnotes
Needs further clarification. [1]
First item
Second item
Something about XML. [2] Copyright c 2004 Robert Nagler All rights reserved nagler@extremeperl.org 165