mulgara - semantic store

skip navigation

SHOW SITE NAV
fixed
fluid
straight

Full-Text Models

Normally in Mulgara, searches for literal values in models only succeed when there is an exact match.

Lucene is a full-text search engine integrated into Mulgara by treating the Lucene index as a model. With full-text models, searches for literal values in models succeed when there is a partial match.

The following sections outline how to create, modify, query full-text models, plus the limitations of full-text models.

Visit the Lucene System Properties Web site for information on the performance and operational properties you can set.

 

Creating Full-Text Models

Use the create command with an optional type argument of <http://mulgara.org/mulgara#LuceneModel> in addition to the model name.

For example, to create the full-text model #foo:

create <rmi://mysite.com/server1#foo> <http://mulgara.org/mulgara#LuceneModel>;

Note - Specifying no type at all creates a normal Mulgara model.

Full-text models are removed in exactly the same way as normal models. For example:

drop <rmi://mysite.com/server1#foo>;

 

Modifying Full-Text Models

As with normal models, use the insert command to insert statements into a full-text model.

When inserting statements into a full-text model, the object is the text that is specially indexed for partial matching. If the object is a literal, the text of the literal is indexed. Indexing literals into a model uses the same as for any other model. For example:

insert <http://www.mysite.com/somedoc.txt>
<http://mulgara.org/mulgara/Document#title> 'Document title'
into <rmi://mysite.com/server1#foo>;

If the object is a resource, the resource is converted into a URL, the URL is accessed by the server, and the content of the URL is indexed. The resource must have either a http: or file: protocol, or the insert fails, sometimes without generating an error. For example:

insert <http://www.mysite.com/somedoc.txt>
<http://mulgara.org/mulgara/Document#abstract>
<http://www.mysite.com/abstract.txt>
into <rmi://mysite.com/server1#foo>;

To perform full-text searching on literals stored in a normal model, the contents of the normal model must be copied into a full-text model. The following example shows how the document titles stored in the normal model #data are copied into the full-text model #foo.

insert select $url <http://mulgara.org/mulgara/Document#title> $title
from <rmi://mysite.com/server1#data>
where $url <http://mulgara.org/mulgara/Document#title> $title
into <rmi://mysite.com/server1#foo>;

If a statement is inserted into a full-text model and the server determines that the MIME type of the document is text/html, then the HTML tags are filtered out before indexing.

Note - The ability of Mulgara to correctly identify HTML input is limited, and only works when fetching a resource via HTTP from a web server that accurately reports the content type.

Use the delete command to delete text from full-text models.

 

Querying Full-Text Models

Queries on full-text models work differently on normal models, as follows:

Given the full-text model #foo populated in the previous section, the following query returns titles with the words "duty" and "care" in the title, as well as an indication of the quality of the match:

select $url $title $score
from <rmi://mysite.com/server1#data>
where $url <http://mulgara.org/mulgara/Document#title> $title
and $url <http://mulgara.org/mulgara/Document#title> '+duty +care'
in <rmi://mysite.com/server1#foo>;

Note - In the example above, the join is performed across the $url column, and the second where constraint is only executed against the full-text model #foo.

The index files for full-text models are stored in the server1/fulltext directory.

 

Limitations of Full-Text Models

Full-text models are an attempt to make a text index act like a Mulgara model, allowing both exact and partial matching to be mixed within queries. Full-text models have the following limitations:

Valid XHTML 1.0 TransitionalValid CSS 3.0!