Thomas J.S.: XML mit SAX parsen (Javaparser)

Beitrag lesen

Hallo Christian,

Da dieser Server nicht nur diesen XML-FIlE einlesen soll sondern auch neue einträge hinzugefügt werden sollen, ist nun meine Frage ob das mit sax überhaupt geht?

Vielleicht hilft dir das:

Grüße Thomas

====================================================================== IBM developerWorks XML Tip October 22, 2002 Vol. 2, Issue 43

IBM's resource for developers http://www-106.ibm.com/developerworks/?nx-10222

TIP: USE A SAX FILTER TO MANIPULATE DATA Change the events output by a SAX stream

Nicholas Chase (nicholas@nicholaschase.com) President, Chase and Chase, Inc.

Hello, XML Tip readers,

The streaming nature of the Simple API for XML (SAX) provides not only an opportunity to process large amounts of data in a short time, but also the ability to insert changes into the stream that implement business rules without affecting the underlying application. This tip explains how to create and use a SAX filter to control how data is processed.

For the rest of the tip, read on below.

Until next week, XML Tip team at IBM developerWorks dWnews@us.ibm.com

Note: This tip uses JAXP. The classes are also part of the Java 2 SDK 1.4, so if you have this version installed, you don't need any additional software. It briefly covers the basics of SAX, but you should already understand the basics of both Java and XML.

This tip looks at an application that determines which employees to notify of a particular emergency situation, and then acts accordingly. (The actual contact is left as an exercise for the reader.) The source document in Listing 1 simply lists employees, their department, and their status:

Listing 1. The source document

<?xml version="1.0"?> <personnel>   <employee empid="332" deptid="24" shift="night"         status="contact">     JennyBerman   </employee>   <employee empid="994" deptid="24" shift="day"         status="donotcontact">     AndrewFule   </employee>   <employee empid="948" deptid="3" shift="night"         status="contact">     AnnaBangle   </employee>   <employee empid="1032" deptid="3" shift="day"         status="contact">     DavidBaines   </employee> </personnel>


THE BASIC APPLICATION

A SAX application consists of two parts. The main application creates an XMLReader that actually parses the document, sending events such as startElement and endDocument to a content handler. You can send errors to a separate error handler object. The handler objects receive these events and act on them.

The main application can also act as either the content or the error handler (or both), but in Listing 2 they are three separate classes:

Listing 2. The main application

import org.xml.sax.helpers.XMLReaderFactory; import org.xml.sax.XMLReader; import org.xml.sax.SAXException; import org.xml.sax.InputSource; import java.io.IOException;

public class MainSaxApp {

public staticvoid main (String[] args){

try {

StringparserClass = "org.apache.crimson.parser.XMLReaderImpl";       XMLReader reader = XMLReaderFactory.createXMLReader(parserClass);

reader.setContentHandler(new DataProcessor());          reader.setErrorHandler(new ErrorProcessor());

InputSource file = new InputSource("employees.xml");          reader.parse(file);

} catch (IOException ioe) {        System.out.println("IO Exception: "+ioe.getMessage());      } catch(SAXException se) {        System.out.println("SAX Exception: "+se.getMessage());      }

}

}

By setting the content handler for the reader to be a DataProcessor object, the application tells the reader to send its events to that object. In Listing 3, the DataProcessor is simple, checking only for the name of the element and the status of employees before determining whether to contact them:

Listing 3. The content handler

import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.Attributes;

public class DataProcessor extends DefaultHandler {    public voidstartElement (String namespaceUri, String localName,                            String qualifiedName, Attributesattributes) {

if(localName.equals("employee")){            if(attributes.getValue("status").equals("contact")){               System.out.println("Contacting employee "+                                  attributes.getValue("empid"));               //Implement actual contact here            }        }    } }

The ErrorProcessor class is trivial, and is included in the source code for this tip. (See Resources related to this tip in Links to other good stuff, below, to download the source code.)

When the application runs, the output includes all of the employees with a status attribute of contact, no matter which department they work in:

Contacting employee 332  Contacting employee 948  Contacting employee 1032


FILTERING THE DATA

So far the application contacts all employees that are listed as on duty regardless of their department, and it works well (or at least, we can hope so!). When you receive a new requirement to contact only employees in a particular department, you have two options:

- Change the content handler and risk all sorts of new bugs  - Change the data that comes to the content handler so that only the    appropriate employees are seen as on duty.

Because other requirements are also likely to be added later, it makes more sense to implement them separately.

A SAX filter sits between a parser and a content handler. It receives events from the parser and, unless instructed otherwise, passes them on to the content handler unchanged. For example, consider this filter in Listing 4:

Listing 4. A simple XML filter

import org.xml.sax.helpers.XMLFilterImpl; import org.xml.sax.Attributes; import org.xml.sax.SAXException;

public class DataFilter extends XMLFilterImpl {

}

The XMLFilterImpl class includes methods that simply pass the data on unchanged. Inserting the filter into the stream within the main application is all that's necessary (see Listing 5):

Listing 5. Inserting the filter into the main application

...           XMLReader reader = XMLReaderFactory.createXMLReader(parserClass);

DataFilter filter = new DataFilter();           filter.setParent(reader);

filter.setContentHandler(new DataProcessor());           filter.setErrorHandler(new ErrorProcessor());

filter.parse("employees.xml");

} catch (IOException ioe) { ...

The application creates the XMLReader as usual, but it's actually the filter that initiates the parse of the file receiving the events from its parent, the XMLReader. (Remember, the filter calls super(parent).) It passes the events on to its content handler -- the same DataProcessor object used in the original version.

So far, the filter just passes the events on unchanged, so running the application still produces this:

Contacting employee 332  Contacting employee 948  Contacting employee 1032

With the filter in place, however, you can easily make changes without touching the main application. For example, in Listing 6, the filter can eliminate all employees that are not in department 24 by simply setting everyone else's status to donotcontact:

Listing 6. Filtering data

... import org.xml.sax.helpers.AttributesImpl;

public class DataFilter extends XMLFilterImpl {s

public void startElement (String namespaceUri, String localName,                         String qualifiedName, Attributes attributes)                          throws SAXException   {

AttributesImpl attributesImpl = new AttributesImpl(attributes);     if (localName.equals("employee")){       if (!attributes.getValue("deptid").equals("24")){           attributesImpl.setValue(3, "donotcontact");       }     }     super.startElement(namespaceUri, localName, qualifiedName, attributesImpl);   }

}

In this case, you're overriding the startElement() method defined in XMLFilterImpl. It still passes on the event, but if the employee is not in department 24, the filter passes it on with an altered Attributes object that lists the employee as do not contact.

The DataProcessor object has no idea that the data has been manipulated. It simply knows that some employees should be contacted and others shouldn't. Processing now produces a different result:

Contacting employee 332


NEXT STEPS

This tip has demonstrated a simple way to alter the processing of a SAX application using an XML filter. In this case, the filter has been pre-determined, but you can build an application to accomodate different situations by choosing filter behavior at run-time. You might accomplish this by replacing the DataFilter class, by passing a parameter at run-time, or even by using a factory to create the filter class in the first place.

A SAX application can also chain filters together so that the output of one filter is used as the input for another, allowing for complex programming in modular chunks.

LINKS TO OTHER GOOD STUFF

::: IBM developerWorks XML Zone ::: http://www.ibm.com/developerworks/xml/?nx-10222

::: Resources related to this tip ::: http://www.ibm.com/developerworks/library/x-tipsaxfilter/#resources

::: Full text of this tip on the Web ::: http://www.ibm.com/developerworks/library/x-tipsaxfilter/?nx-10222

::: Index of other XML tips ::: http://www.ibm.com/developerworks/library/x-tips.html?nx-10222

::: Most recent issue of the IBM developerWorks newsletter: http://www.ibm.com/developerworks/newsletter/dwte101702.html?nx-10222

ABOUT THIS NEWSLETTER Created by IBM developerWorks (http://www.ibm.com/developerworks/) Delivered by Topica (http://www.topica.com/tep/index.html)

Subscribe: http://www-106.ibm.com/developerworks/newsletter/?n-about Unsubscribe: http://ibm.email-publisher.com/u/?a84vCg.baXFXj Get help: mailto:customersupport@ibmdw.email-publisher.com Send comments: http://www-105.ibm.com/developerworks/newcontent.nsf/dW_feedback/ IBM's privacy policy: http://www.ibm.com/privacy/ IBM's copyright and trademark information: http://www.ibm.com/legal/copytrade.phtml

THIS NEWSLETTER IS FOR INFORMATION ONLY. This newsletter should not be interpreted to be a commitment on the part of IBM, and, after the publication date, IBM cannot guarantee the accuracy of any information presented. You may copy and distribute this newsletter, as long as:

  1. All text is copied without modification and all pages are included.
  2. All copies contain IBM's copyright notice and any other notices provided therein.
  3. This document is not distributed for profit.