XML Minimalism.

Post Reply
Site Admin
Posts: 88
Joined: Tue Jan 31, 2006 3:10 pm
Location: Moss Norway

XML Minimalism.

Post by KBleivik »

The XML family of technologies is very rich. In addition there are different parsers in different languages like PHP. It is fairly easy to start XML tagging if you are used to HTML tagging. Some other XML technologies like XML Schema and XSLT also use XML tags with name spaces. So it is very important to understand namespaces. If you are used to PHP, it is sometimes tempting to use PHP XML parsers and DOM to have the job done. But remember, you can do a lot of things only by using XSLT. And you can define your grammar by using XML Schema. So before you use a scripting solution, you should see if it is possible to have the job done in e.g. XSLT. That have some advantages:
  1. Robustness. A solution within the family should be most robust.
  2. There are extensions like EXSLT.
  3. Simplicity. Make it simple, as simple as possible, but no simpler.
  4. Last but not least, minimalism.
Some useful hints:

XML and the importance of proper encoding and document declaration
When working with documents containing language specific-data or when working with internatinalization and XML, you must deal with encoding properly.

Example, if you are from France and use French character set, you may get an error like this if you do not use proper encoding:

"Input is not proper UTF-8, ..."

if you use the libxml2.

The following declaration at the start of the document may fix the error:

<?xml version="1.0" encoding="ISO-8859-1"?>

There is also another aspect when usning libxml2, internal storage of an XML document. Regardless of the encoding specified for a document, the encoding is stored internally within libxml2 in UTF-8 format. Knowing this may save you hours of boring work, trying to fix an error.

Fortunately there are two extensions, iconv or mbstring that you should use when performing encoding conversion

The XSL document function.

Code: Select all

<xsl:for-each select="document('products.xml')/catalog/product">
First of all, note that XPath is the foundation for XSLT, so all the functionality and functions from XPath are available within XSLT. In addition XSLT has its own functions and extensions. Since you use the XPath function position(), you should be able to use the XPath operators <=, >= pluss the bolean operator and directly in your test if it is a valid XPath statement. If it is a predicate, it should be enclosed in [].

Generally you use this

axis::node test [predicats]

structure while filtering node sets. So you need to know the following concepts to perform tests:
  1. Axes
  2. node test (name test / node type test)
  3. Predicates that filters the node set.
Example filtering from the root node (document node in XPath 2.0).

/*/NodeName1/NodeName2/[position() >= 1 and position() <= 3]

Example of a more advanced filtering based on attributes and an XPath function:

/*/*[local-name()="MyAttribute"]/*/*[position() >= 1 and position() <= 3]

By combining XPath, XPointer and XInclude, it should also be possible to filter fragments of external documents that can be embedded into your own documents without using any external script or programming language.

Example: You have an external file external.xml with the following structure:

<myelement1 xmlns=xi="http//www.w3.org/2001/XInclude">
<myelement2 xml:id="myID">


You can then access myelement2 with the myID id in external.xml like this:

<myelement1 xmlns=xi="http//www.w3.org/2001/XInclude">
<xi:include href="external.xml" parse "xml" xpointer="xpointer(id('myID'))"/>
Broken link or external server down ...

My personal priority of learning XML.
  1. XML tagging.
  2. XPath
  3. XPointer and XInclude.
  4. Name spaces in XML.
  5. XSL(T) and XSL-FO
  6. XML Schema
  7. XLink
  8. Other XML technologies
XML tutorials.

Conclusion and advice:
If you find a simple solution within the XML member family, use that before you use a scripting / programming solution.

One advantage with external streaming parsers, is that the memory usage is much lesser than for tree based parsers that operate on the node tree in memory. So if memory usage is critical, that is an argument for using stream based parsers like SAX XML or XMLReader.
Kjell Gunnar Bleivik
Make it simple, as simple as possible but no simpler: | DigitalPunkt.no |

Post Reply