Logo-small

3.5.4 XMLPullParser

While SAX is commonly and thought to be used as a read-all solution, the proper solution to filter nodes is to use a StAX parser, which is currently partially supported in Pharo in the XMLPullParser package. XMLPullParser is a streaming XML parser. Instead of building a DOM tree, the parser streams over 'events' (start tag, text, end tag) on demand. For more information on the technique in general, see xmlpull.org. This code is a mashup of Antony Blakey's work for VisualWorks (ported by permission) together with the tokenizer from YAXO.

Installation

Open the Configuration Browser, select XMLPullParser, and click Install.

Usage example

It could be useful for reading just the header of a large XML document. In the following example, the XML document is pretty large (it is an actual DNA sequence alignment returned in XML by the NCBI BLAST server), and we want the parser stop parsing when the header ends, this is the BlastOutput_iterations node below:

| parser |
parser := XMLPullParser parse: '<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN"
"NCBI_BlastOutput.dtd">
<BlastOutput>
<BlastOutput_program>blastn</BlastOutput_program>
<BlastOutput_version>BLASTN 2.2.26\+</BlastOutput_version>
<BlastOutput_reference>Zheng Zhang, Scott Schwartz, Lukas Wagner,
and Webb Miller (2000), &quot;A greedy algorithm for aligning DNA
sequences&quot;, J Comput Biol 2000;
7(1-2):203-14.</BlastOutput_reference>
<BlastOutput_param>
<Parameters>
<Parameters_expect>10</Parameters_expect>
<Parameters_sc-match>1</Parameters_sc-match>
</Parameters>
</BlastOutput_param>
<BlastOutput_iterations>'
.
[ parser isStartTag: 'BlastOutput_iterations' ]
whileFalse:[
Transcript show: parser text; cr.
parser next ]

Add a Note

Licensed under Creative Commons BY-NC-SA | Published using Pier |