mulgara - semantic store

skip navigation

SHOW SITE NAV
fixed
fluid
straight

Parsers

The content handler takes data from files and converts it into RDF triples that can be navigated using the Statements interface. Usually the data contained in the target file is not in a natural RDF format so the content handler must convert the data for it to be relevant. Whether this is a matter of parsing text or converting binary data is up to the implementer. The MP3 resolver uses a parser package that transforms the ID3 tags into meaningful RDF triples, which the Statements implementation can then navigate.

When parsing the data you need to decide on what kind of temporary storage to use for the RDF triples. Remember that the Statements object navigates through triples in a list style so the storage method used must be able to accommodate this.

You also need to consider how the triples should be stored across queries against the URL. Should you cache them to save time if the URL is queried again? Should you store the data in memory (you can only process small files but has faster access) or on disk (you can handle large files but with slower access)? For the purposes of the MP3 parser, a JRDF graph stores the triples in memory, as ID3 tags are small.

 
Configuration and Initialisation

How the parsing section of the handler is set up is up to the developer. The way the data is added is irrelevant, only the fact that the data is added in a Statements API navigable format is important. Some configuration for your parsing solution might be required. For the MP3 resolver there is a configuration file with properties telling the factory where to find the implementations of the ID3 tag parsing classes.

The properties file looks like the following (see the parserfactory.conf file in the conf/resolvers/mp3/ directory of your Mulgara installation):

id3parser.class = org.mulgara.resolver.mp3.parser.ID3ParserImpl
id3v1parser.class = org.mulgara.resolver.mp3.parser.ID3v1ParserImpl
id3v2parser.class = org.mulgara.resolver.mp3.parser.ID3v2ParserImpl

The properties file with name value pairs is read into the factory's properties by the following lines of code (see ParserFactory.java):

// Initialise our properties
properties = new Properties();

// Retrieve the resource url for our configuration file
URL parserConfiguration = this.getClass().getResource("/parserfactory.conf");

try {

// Load the properties for the parser factory using the stream from the URL
properties.load(parserConfiguration.openStream());
} catch (IOException ioException) {

throw new FactoryException("Unable to load the parser factory " +
"configuration from: " +
parserConfiguration.toString(), ioException);
}

No other configuration is required for the parser.

 
Processing ID3 Tags into RDF

Since parsers vary depending on the content of what they are parsing, only a summary of the steps are included here, highlighting the process of data conversion. The actual implementation of this process is in the the src/jar/content-mp3/java/org/mulgara/content/mp3/parser/ directory of your Mulgara installation.

ID3 tags are passed into the ID3Parser class using the MP3Conversion container bean. Within the general ID3Parser class, the version 2 and version 1 tags are separately parsed with the ID3v2Parser and ID3v1Parser implementation classes respectively. Within the MP3Conversion object there is a JRDF Graph object that stores the statements pertaining to the ID3 tag data. Available to each MP3Conversion object's graph is a dictionary of RDF predicates, which correspond to each of the tag headers that might be found in ID3 tags (see IdentifierProcessor.java).

Both ID3 tag parsers (ID3v1ParserImpl.java and ID3v2ParserImpl.java) contain a method called parseRDF() that is used to generate the RDF triples that the ID3 tags parse to. Each tag set belongs to its own resource, unified under a single MP3 resource unique to the original file. As tag identifiers are processed, the respective resources have triples which contain the tag's resource ID, the predicate mapping for the current identifier and the literal value from the tag added. After completion of this parsing process the conversion is then handed back to the calling object (MP3Statements) for processing of the Graph object.

 
Storing Data

As described above, all triples are added to the graph and are then passed to the MP3Statements object for navigation when the resolver resolves the constraints. Everything is done in memory and is relatively fast. The drawback to this is that the triples are lost once the resolver has finished constraining the statements. Since ID3 tags are relatively small, this is not too much of a problem. However, in something like an MBox handler, file sizes and message counts are larger so caching of the graph occurs to prevent the duplication of processing. It also means that the data is persisted across executions.

Valid XHTML 1.0 TransitionalValid CSS 3.0!