Parsers
The content handler takes data from files and converts it into RDF triples that can be navigated using the Statements
interface. Usually the data contained in the target file is not in a natural RDF format so the content handler must convert the data for it to be relevant. Whether this is a matter of parsing text or converting binary data is up to the implementer. The MP3 resolver uses a parser package that transforms the ID3 tags into meaningful RDF triples, which the Statements
implementation can then navigate.
When parsing the data you need to decide on what kind of temporary storage to use for the RDF triples. Remember that the Statements
object navigates through triples in a list style so the storage method used must be able to accommodate this.
You also need to consider how the triples should be stored across queries against the URL. Should you cache them to save time if the URL is queried again? Should you store the data in memory (you can only process small files but has faster access) or on disk (you can handle large files but with slower access)? For the purposes of the MP3 parser, a JRDF graph stores the triples in memory, as ID3 tags are small.
Configuration and Initialisation
How the parsing section of the handler is set up is up to the developer. The way the data is added is irrelevant, only the fact that the data is added in a Statements
API navigable format is important. Some configuration for your parsing solution might be required. For the MP3 resolver there is a configuration file with properties telling the factory where to find the implementations of the ID3 tag parsing classes.
The properties file looks like the following (see the parserfactory.conf
file in the conf/resolvers/mp3/
directory of your Mulgara installation):
id3parser.class = org.mulgara.resolver.mp3.parser.ID3ParserImpl
id3v1parser.class = org.mulgara.resolver.mp3.parser.ID3v1ParserImpl
id3v2parser.class = org.mulgara.resolver.mp3.parser.ID3v2ParserImpl
The properties file with name value pairs is read into the factory's properties by the following lines of code (see ParserFactory.java
):
// Initialise our properties
properties = new Properties();
// Retrieve the resource url for our configuration file
URL parserConfiguration = this.getClass().getResource("/parserfactory.conf");
try {
// Load the properties for the parser factory using the stream from the URL
properties.load(parserConfiguration.openStream());
} catch (IOException ioException) {
throw new FactoryException("Unable to load the parser factory " +
"configuration from: " +
parserConfiguration.toString(), ioException);
}
No other configuration is required for the parser.
Processing ID3 Tags into RDF
Since parsers vary depending on the content of what they are parsing, only a summary of the steps are included here, highlighting the process of data conversion. The actual implementation of this process is in the the src/jar/content-mp3/java/org/mulgara/content/mp3/parser/
directory of your Mulgara installation.
ID3 tags are passed into the ID3Parser
class using the MP3Conversion
container bean. Within the general ID3Parser
class, the version 2 and version 1 tags are separately parsed with the ID3v2Parser
and ID3v1Parser
implementation classes respectively. Within the MP3Conversion
object there is a JRDF Graph
object that stores the statements pertaining to the ID3 tag data. Available to each MP3Conversion
object's graph is a dictionary of RDF predicates, which correspond to each of the tag headers that might be found in ID3 tags (see IdentifierProcessor.java
).
Both ID3 tag parsers (ID3v1ParserImpl.java
and ID3v2ParserImpl.java
) contain a method called parseRDF()
that is used to generate the RDF triples that the ID3 tags parse to. Each tag set belongs to its own resource, unified under a single MP3 resource unique to the original file. As tag identifiers are processed, the respective resources have triples which contain the tag's resource ID, the predicate mapping for the current identifier and the literal value from the tag added. After completion of this parsing process the conversion is then handed back to the calling object (MP3Statements
) for processing of the Graph
object.
Storing Data
As described above, all triples are added to the graph and are then passed to the MP3Statements
object for navigation when the resolver resolves the constraints. Everything is done in memory and is relatively fast. The drawback to this is that the triples are lost once the resolver has finished constraining the statements. Since ID3 tags are relatively small, this is not too much of a problem. However, in something like an MBox handler, file sizes and message counts are larger so caching of the graph occurs to prevent the duplication of processing. It also means that the data is persisted across executions.