Whitespace and XML.

Post Reply
KBleivik
Site Admin
Posts: 88
Joined: Tue Jan 31, 2006 3:10 pm
Location: Moss Norway
Contact:

Whitespace and XML.

Post by KBleivik »

Whitespace can be a nightmare in XML, more precisely in XSL(T) and other technologies. Knowing how to handle whitespace can save you hours of work and let you sleep better in the nights. A space, a tab, a linefeed and a carriage return all introduce white space, that may be interpreted different in different parsers and browsers.

Note: No whitespace is allowed before the XML declaration:

<?xml version="1.0" ?>


elemenTag>Some text here</elemenTag>

is different from:
<elemenTag>
Some text here
</elemenTag>

that is again different from
<elemenTag>

Some text here

</elemenTag>

Some useful hints:

Example 1: Use xml:space to handle white space. Valid values are preserve and default.

Example 2: Write your own XML Schema to handle white space.

Code: Select all

<?xml version="1.0?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="myElement">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:whiteSpace value="collapse" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:schema>
Valid values are preserve, replace and collapse.

Example 3: Line break in XSLT.

Code: Select all

<xsl:text>
</xsl:text>
The examples below, that are taken from the SitePoint book by Thomas Meyer (April 2006) No Nonsense XML Web Development With PHP show how different style sheets can be implemented to transform XML documents. You should view them in different browsers, older and the newest versions.

Note the following cite from that book page 48.

"Well by default, the XSLT standard mandates that whenever there is only whitespace (including line breaks) between two tags, the whitespace should be ignored. But when there is text between two tags, (e.g. TO:), then the whitespace in and around that text should be passed along to the output".


1. http://www.kjellbleivik.com/WPW/letter.xml

Code: Select all

<?xml version="1.0"?>
<letter>
    <to>WPW members</to>
    <from>kgun</from>
    <message>The importance of well-formed and valid tagging.</message>
</letter>
2. http://www.kjellbleivik.com/WPW/letter-css.xml

Code: Select all

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="letter.css" version="1.0"?>
<letter>
    <to>WPW members</to>
    <from>kgun</from>
    <message>The importance of well-formed and valid tagging.</message>
</letter>
Style sheet code, letter.css

Code: Select all

letter {
    display: block;
    margin: 10px;
    padding: 5px;
    width: 300px;
    height: 100px;
    border: 1px solid #00000;
        overflow: auto;
        background-color: #cccccc;
        font: 12px Arial;
}
to, from {
    display: block;
    font-weight: bold;
}
message {
    display: block;
    font: 11px Arial;
}
3. http://www.kjellbleivik.com/WPW/letter-text.xml

Code: Select all

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="letter2text.xsl" version="1.0"?>
<letter>
    <to>WPW members</to>
    <from>kgun</from>
    <message>The importance of well-formed and valid tagging.</message>
</letter>
Style sheet code, letter2text.xsl

Code: Select all

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:template match="/letter">
        <xsl:apply-templates select="*"/>
    </xsl:template>
    <xsl:template match="to">
        <xsl:text>TO: </xsl:text>
        <xsl:apply-templates/>
        <xsl:text>
</xsl:text>
    </xsl:template>
    <xsl:template match="from">
        <xsl:text>FROM: </xsl:text>
        <xsl:apply-templates/>
        <xsl:text>
</xsl:text>
    </xsl:template>
    <xsl:template match="message">
        <xsl:text>MESSAGE: </xsl:text>
        <xsl:apply-templates/>
        <xsl:text>
</xsl:text>
    </xsl:template>
</xsl:stylesheet>
4. http://www.kjellbleivik.com/WPW/letter-html.xml

Code: Select all

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="letter2html.xsl" version="1.0"?>
<letter>
    <to>WPW members</to>
    <from>kgun</from>
    <message>The importance of well-formed and valid tagging.</message>
</letter>
Style sheet code, letter2html.xsl

Code: Select all

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html"/>
    <xsl:template match="/letter">
        <html>
            <head><title>Letter</title></head>
            <body><xsl:apply-templates/></body>
        </html>
    </xsl:template>
    <xsl:template match="to">
        <b>TO: </b><xsl:apply-templates/><br/>
    </xsl:template>
    <xsl:template match="from">
        <b>FROM: </b><xsl:apply-templates/><br/>
    </xsl:template>
    <xsl:template match="message">
        <b>MESSAGE: </b><xsl:apply-templates/><br/>
    </xsl:template>
</xsl:stylesheet>
5. http://www.kjellbleivik.com/WPW/letter-xml.xml

Code: Select all

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="letter2xml.xsl" version="1.0"?>
<letter>
    <to>WPW members</to>
    <from>kgun</from>
    <message>The importance of well-formed and valid tagging.</message>
</letter>
Style sheet code, letter2xml.xsl

Code: Select all

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="/letter">
        <letter><xsl:apply-templates/></letter>
    </xsl:template>
    <xsl:template match="to">
        <recipient><xsl:apply-templates/></recipient>
    </xsl:template>
    <xsl:template match="from">
        <sender><xsl:apply-templates/></sender>
    </xsl:template>
    <xsl:template match="message">
        <body><xsl:apply-templates/></body>
    </xsl:template>
</xsl:stylesheet>
6. http://www.kjellbleivik.com/WPW/letter-xhtml.xml

Code: Select all

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="letter2xhtml.xsl" version="1.0"?>
<letter>
    <to>WPW members</to>
    <from>kgun</from>
    <message>The importance of well-formed and valid tagging.</message>
</letter>
Style sheet code, letter2xhtml.xsl

Code: Select all

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"
        media-type="application/xhtml+xml" encoding="iso-8859-1"
        doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"   doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
    <xsl:template match="/letter">
        <html>
            <head><title>Letter</title></head>
            <body><xsl:apply-templates/></body>
        </html>
    </xsl:template>
    <xsl:template match="to">
        <b>TO: </b><xsl:apply-templates/><br/>
    </xsl:template>
    <xsl:template match="from">
        <b>FROM: </b><xsl:apply-templates/><br/>
    </xsl:template>
    <xsl:template match="message">
        <b>MESSAGE: </b><xsl:apply-templates/><br/>
    </xsl:template>
</xsl:stylesheet>
Pay careful attention to the last two examples. In addition to demonstrating the importance of one source and many applications (formats) it demonstrates how the message is presented in different browsers and how different Se BOTS may interpret the markup.
Excercise:

1. Explain the differences for different (versions) of web browsers. View source.

2. Explain how todays SE Bots see the markup.

3. Explain how tomorrows SE Bots may see the code and how important well-formed (and valid) code may be.

4. Try the code on your own test / web server by "illeagally" nesting tags, leave tags open and change one letter in some tags eg. from lowercase to uppercase etc.
Kjell Gunnar Bleivik
Make it simple, as simple as possible but no simpler: | DigitalPunkt.no |

Post Reply