Mulgara | Semantic Store - Full-Text Models

Full-Text Models

Normally in Mulgara, searches for literal values in models only succeed when there is an exact match.

Lucene is a full-text search engine integrated into Mulgara by treating the Lucene index as a model. With full-text models, searches for literal values in models succeed when there is a partial match.

The following sections outline how to create, modify, query full-text models, plus the limitations of full-text models.

Visit the Lucene System Properties Web site for information on the performance and operational properties you can set.

Creating Full-Text Models

Use the create command with an optional type argument of <http://mulgara.org/mulgara#LuceneModel> in addition to the model name.

For example, to create the full-text model #foo:

create <rmi://mysite.com/server1#foo> <http://mulgara.org/mulgara#LuceneModel>;

Note - Specifying no type at all creates a normal Mulgara model.

Full-text models are removed in exactly the same way as normal models. For example:

drop <rmi://mysite.com/server1#foo>;

Modifying Full-Text Models

As with normal models, use the insert command to insert statements into a full-text model.

When inserting statements into a full-text model, the object is the text that is specially indexed for partial matching. If the object is a literal, the text of the literal is indexed. Indexing literals into a model uses the same as for any other model. For example:

insert <http://www.mysite.com/somedoc.txt>
<http://mulgara.org/mulgara/Document#title> 'Document title'
into <rmi://mysite.com/server1#foo>;

If the object is a resource, the resource is converted into a URL, the URL is accessed by the server, and the content of the URL is indexed. The resource must have either a http: or file: protocol, or the insert fails, sometimes without generating an error. For example:

insert <http://www.mysite.com/somedoc.txt>
<http://mulgara.org/mulgara/Document#abstract>
<http://www.mysite.com/abstract.txt>
into <rmi://mysite.com/server1#foo>;

To perform full-text searching on literals stored in a normal model, the contents of the normal model must be copied into a full-text model. The following example shows how the document titles stored in the normal model #data are copied into the full-text model #foo.

insert select $url <http://mulgara.org/mulgara/Document#title> $title
from <rmi://mysite.com/server1#data>
where $url <http://mulgara.org/mulgara/Document#title> $title
into <rmi://mysite.com/server1#foo>;

If a statement is inserted into a full-text model and the server determines that the MIME type of the document is text/html, then the HTML tags are filtered out before indexing.

Note - The ability of Mulgara to correctly identify HTML input is limited, and only works when fetching a resource via HTTP from a web server that accurately reports the content type.

Use the delete command to delete text from full-text models.

Querying Full-Text Models

Queries on full-text models work differently on normal models, as follows:

The where clause must have a literal-valued object.
The object portion of the where clause is passed to the underlying search engine as a pattern. The following types of pattern searches are possible:
- Wildcards
- Fuzzy
- Word proximity
- Boosting a term
- Boolean operators (and, or, not, +, -)
  For more information on Lucene searching, see the Lucene query syntax.
If the $score variable is specified in the select clause, it is assigned a number from 0 to 1 indicating how close the match was. The $score variable must be part of the select clause in order to see it in the result.

Given the full-text model #foo populated in the previous section, the following query returns titles with the words "duty" and "care" in the title, as well as an indication of the quality of the match:

select $url $title $score
from <rmi://mysite.com/server1#data>
where $url <http://mulgara.org/mulgara/Document#title> $title
and $url <http://mulgara.org/mulgara/Document#title> '+duty +care'
in <rmi://mysite.com/server1#foo>;

Note - In the example above, the join is performed across the $url column, and the second where constraint is only executed against the full-text model #foo.

The index files for full-text models are stored in the server1/fulltext directory.

Limitations of Full-Text Models

Full-text models are an attempt to make a text index act like a Mulgara model, allowing both exact and partial matching to be mixed within queries. Full-text models have the following limitations:

Lucene is not transactional, so neither are the full-text models. If an operation fails, full-text models do not roll back along with the normal models.
The backup and restore commands do not include the content of full-text models.

mulgara - semantic store

Full-Text Models

Creating Full-Text Models

Modifying Full-Text Models

Querying Full-Text Models

Limitations of Full-Text Models