Mulgara | Semantic Store - Inferencing and Mulgara

Inferencing and Mulgara

Inferencing is a process of producing new RDF statements from a set of existing ones. This is implemented in Mulgara using a base model of RDF statements, applying a set of rules, then storing the newly generated statements in a new model.

We can demonstrate this by defining a small data model and a set of rules. The data model consists of the following statements:

[ (leftHand, partOf, body)
(rightHand, partOf, body)
(leftIndexFinger, partOf, leftHand)
(leftHand, is, left)
(rightHand, is, right)
(leftHand, hasName, "left hand")
(rightHand, hasName, "right hand")
(leftIndexFinger, hasName, "left index finger") ]

The rules are defined as follows:

Create a hierarchy of partOf statements where properties are inherited. That is:
- if (x, partOf, y) and ( y, partOf, z) then (x, partOf z)
- if (a, is, b) then (z, is, b)
- When Rule 1 is executed call Rule 2
Create a new statement such that, if (x, is, y) then ( x, hasProperty, y)

The statements in the new model generated by these rules are as follows:

[ (leftIndexFinger, partOf, body)
(leftIndexFinger, is, left)
(leftIndexFinger, hasProperty, left)
(leftHand, hasProperty, left)
(rightHand, hasProperty, right) ]

Models Required

Inferencing requires ways of grouping statements into differing types of models, allowing the system to differentiate and apply appropriate operations on them. Also, models have configuration and other statements made about them in the system model. This allows the system to further control how models are treated during inferencing and other operations.

Initially, there are three model types:

Base models
Schema models (RDFS and OWL)
Inference models

Base Model

Base models are the current type of Mulgara model used for storing RDF statements.

Schema Model

A schema model is either an OWL or RDFS typed schema model. OWL and RDFS are predefined sets of rules, and when and how to apply them when certain statements are encountered. There are three versions of OWL:

OWL Lite
OWL DL
OWL Full

The difference between the three versions is the number of and complexity of the rules.

The type of configuration to apply to schema models defines how the rules are applied. This includes which combination of rules and base statements generate an inference model, as well as which rules are processed ahead of time (forward chaining) or at query time (backward chaining).

Schema models usually have a very a small number of statements in comparison to the base models. A schema model, or collection of models, is tied to one or more base models. The system allows one or more schemas models to be applied to one or more collections of base models to generate one or more sets of new inferred models.

It is also important to allow classes to be used as instances. A class is a definition of the properties of an object. An instance is a particular concrete version of a type of class. An analogy in Java is a class is an interface and an object created at run-time with a new statement is an instance. In OWL this view of whether something is an instance or a class can change depending on how they are used. If a user is creating an ontology, from their point of view the schema model contains a set of instances to be manipulated. To the inferencing system, the schema model is a set of classes to be used by rules to generate new statements.

Inference Model

An inference model contains the result of executing the rules defined in the schema model against the data stored in the base models. Usually, inference models are not directly queried by the user but are queried in conjunction with the base model or models.

Having an inference model separate from the other models allows the inferred model to be modified at any time. If the base models or schema models change only the inference model needs to change. To provide improved granularity and maximum control over inferred statements, these inferred statements can be composed of several models aggregated together. This provides a way to retain a map of inferred statements against the original data, schemas and rules that were applied. When the parts of the schema or original data changes only the minimal set of statements related to that change is removed and then re-inferred.

These mappings, of inferred statements, rules and base facts, take the form of annotations and describe the set of statements and set of rules that generated them. A rule is further constrained by expressing a subset of the original statements using an iTQL^TM query.

An Example

Take the example where you have two models:

A base model of facts
An RDFS schema model

The administrator or inference optimizer determines that the RDFS rule subClassOf should be fully inferred and the results placed in an inferred model of its own. The remaining RDFS rules are fully inferred and and their results placed in a second inferred model.

This is expressed using the following set of statements:

[ ( <baseModel1>, <modelType>, <model> )
( <rdfsSchemaModel1>, <modelType>, <rdfs> )
( <inferenceModel1>, <modelType>, <inference> )
( <inferenceModel2>, <modelType>, <inference> )
( <rdfs:subClassOf>, <inferenceType>, <forward> )
( <rdfs:domain>, <inferenceType>, <forward> )
( <inferenceModel1>, <includesRule>, <rdfs:subClassOf> )
( <inferenceModel2>, <includesRule>, <rdfs:domain> )
...
]

Any user queries are now performed against the union of three models:

Base facts (baseModel1)
The subClassOf inference model (inferenceModel1)
The remaining RDFS rules (inferenceModel2)

In this way each inference model handles different groups of inferred statements. By splitting the two inference models between subClassOf and the other RDFS rules we are saying that they are computationally equivalent. This is determinate on the following factors:

The rules to apply
The schema
The original data
How often and what type of changes occur to the schema or original data

A performance enhancement that a user or the system can determine is when certain queries are leading to continual dropping and re-inferencing. Alternatively, continually inferring the same set of statements at query time would benefit from caching these statements in an inference model.