Everything you need to know about parsing XML – without being baffled by long winded ramblings and flag waving for the latest/greatest frameworks.
There are 2 basic parsing strategies with XML.
1. DOM – Document/tree based strategy.
2. SAX – event based strategy.
- Nested tree structure.
- Process and query with XPath or other parser supported functions.
- Implements specific interface defined by W3C.
- You have to load the whole document into memory – so if the XML is big – then you have to watch out.
- Simple and large XML documents do not suit this style.
- Random access to the document.
- Complex to use.
- Event based.
- Starting/Ending tags, XML comments and entity declarations all raise events.
- Flexible – you can handle events when needed, and take action on those events as they are raised.
- Once events have been fired they are gone, you have to re-read the document again if you need to get back to the same bit of XML again.
- Malformed XML can cause the code to fall into exception – make sure your document is well formed before passing to the parser.
C# and Java XML Implementations
- Look at System.Xml.XmlReader.
- DOM trees can be used via XmlDocument – and XPathDocument lets you interrigate the tree nodes via XPath.
- If you are using 1.5 you are in luck as this version includes the Apache Xerces project.
- Look at packages javax.xml.* – this also has XPath functionality.
- DOM functionality can be found in org.w3c.dom.*
- If you are using Java 1.4 and below, you will have to install Xerces.