Infrastructure:
Markup
Parke Godfrey
21 September 2012
24 September 2012
CSE-2041
Parke Godfrey
21 September 2012
24 September 2012
CSE-2041
These slides are based in part on ones from the following sources.
How to present content so it is rendered in a
well-designed layout and
nicely typeset?
The “Web” could just deliver PDF documents, or some other pre-rendered format (e.g., docx or flash) over HTTP.
Would need a mechanism for hyperlinks.
Could use your favourite word processor to make “content”.
However, this would not be very flexible.
flexible rendering
Content (document, page) rendered for the client (and by the client).
Rendering may differ, depending on the client. Why?
easy editing
Straightforward to edit pages.
Not a proprietary format.
mash up
Can combine dynamically content automatically from different sources.
interactive
Support interactive pages, such as for webapps.
Our content format should abstract away from how it is to be rendered.
Our content format specifies (ideally) the semantics of its parts.
The renderer finds a sensible way to present the content in a meaningful layout that reflects the content's semantics.
Markup languages provide this abstraction.
The source document is a mix of markup elements and content.
The source is usually a simple text file (e.g., ASCII or unicode).
There is usually a standard render engine, or several, that display the document in pretty form.
The render engine either is like a compiler, and produces an object document, or like an interpreter, and renders the presentation on the fly.
E.g., XML, HTML (and its flavours), Latex (and Tex), & roff (troff & nroff).
XML is a markup language that is entirely semantic based.
An XML document is a tree.
Nodes in the tree are labelled (tags).
The tags can be anything.
An example document: bibliography.xml.
And a pretty-print version — source has line-feeds and indenting — of the same: bibliography-pretty.xml.
XML as a transport format
Provides a universal format for exchanging data between machines.
Avoids problems of data format incompatibilities between machine architectures. E.g., big and little endian.
XML as a data model
XML is a data model, as is the relational data model.
The XML data model is more flexible in many ways than the relational model.
XML as a document format
XML provides a simple but elegant way to structure documents via markup.
XML has become a common standard.
We can think of XML as a database, just as the relational databases we have studied.
XML database systems now exist.
Any collection of XML documents (e.g., web pages) can be treated as a database.
There are well developed query languages for XML.
model
Manages the data (the content).
Defines the interactions possible with the data.
view
Defines how to render the content into a presentation.
Defines how to interact with the content.
There can be multiple views for the same model.
controller
Manages the interactions via the view against the model.
The Web follows the MVC paradigm.
model: HTML (and XHTML) for markup
view: CSS for styling
controller: JavaScript for behaviour
We study each of these in Section III: Client-side.
We study the basics of markup and HTML here.
Simple markup for text and content. Structures the elements.
Embedded multimedia (e.g., images).
Hypertext links to other Web “objects”.
Fill-in form elements and buttons for interactivity.
“Derived” from XML. Instead of free tags, there is a defined list of tags.
Originally, derived from Standard Generalized Markup Language (SGML).
Why? This provided existing tools such as parsers.
XML developed in parallel with HTML (and, originally, derives from SGML too).
HTML standards later changed (HTML4, XHTML, HTML5) to define HTML as derived from XML instead.
Why? Tremendous support exists for XML. These tools apply directly to HTML too.
spaces, new lines, and tabs
normalization
non‐breaking spaces
<pre>
,
<code>
, and
<br/>
character references
entity references
Tags serve to semantically structure the document. Key examples from HTML5:
<header>
<nav>
<article>
<section>
<aside>
<footer>
block versus inline elements
<h1>
–
<h6>
<p>
<strong>
<em>
<hr/>
<q>
&
<blockquote>
<address>
<time>
<figure>
&
<figcaption>
<img>
<audio>
<video>
Screen units: px, %, em, & pt
<a>
with an HTTP URL
<a>
with an absolute path
<a>
with a relative path
<a>
with a fragment target
<a>
also anchors
the other side of a link!
<ul>
<li>
<ol>
<li>
<dl>
<dt>
<dd>
<table>
<th>
<tr>
<td>
<html>
Should the format be loosely or strictly enforced?
loose
–
Harder to author pages.
Harder to maintain valid documents.
+
Automated tools can understand and manipulate content.
Renderer knows how to handle the page.
E.g., XHTML
strict
+
Easier to author pages.
Renderer makes best effort. (Graceful degradation.)
–
Automated tools have a harder time to understand and manipulate content.
Renderer can mess up badly. (Document is harder to parse. Renderer may refuse non-well-formed or invalid documents.)
E.g., HTML4, HTML5