Full-Text Models
Normally in Mulgara, searches for literal values in models only succeed when there is an exact match.
Lucene is a full-text search engine integrated into Mulgara by treating the Lucene index as a model. With full-text models, searches for literal values in models succeed when there is a partial match.
The following sections outline how to create, modify, query full-text models, plus the limitations of full-text models.
Visit the Lucene System Properties Web site for information on the performance and operational properties you can set.
Creating Full-Text Models
Use the create
command with an optional type argument of <http://mulgara.org/mulgara#LuceneModel>
in addition to the model name.
For example, to create the full-text model #foo
:
create <rmi://mysite.com/server1#foo> <http://mulgara.org/mulgara#LuceneModel>;
Note - Specifying no type at all creates a normal Mulgara model.
Full-text models are removed in exactly the same way as normal models. For example:
drop <rmi://mysite.com/server1#foo>;
Modifying Full-Text Models
As with normal models, use the insert
command to insert statements into a full-text model.
When inserting statements into a full-text model, the object is the text that is specially indexed for partial matching. If the object is a literal, the text of the literal is indexed. Indexing literals into a model uses the same as for any other model. For example:
insert <http://www.mysite.com/somedoc.txt>
<http://mulgara.org/mulgara/Document#title> 'Document title'
into <rmi://mysite.com/server1#foo>;
If the object is a resource, the resource is converted into a URL, the URL is accessed by the server, and the content of the URL is indexed. The resource must have either a http:
or file:
protocol, or the insert
fails, sometimes without generating an error. For example:
insert <http://www.mysite.com/somedoc.txt>
<http://mulgara.org/mulgara/Document#abstract>
<http://www.mysite.com/abstract.txt>
into <rmi://mysite.com/server1#foo>;
To perform full-text searching on literals stored in a normal model, the contents of the normal model must be copied into a full-text model. The following example shows how the document titles stored in the normal model #data
are copied into the full-text model #foo
.
insert select $url <http://mulgara.org/mulgara/Document#title> $title
from <rmi://mysite.com/server1#data>
where $url <http://mulgara.org/mulgara/Document#title> $title
into <rmi://mysite.com/server1#foo>;
If a statement is inserted into a full-text model and the server determines that the MIME type of the document is text/html, then the HTML tags are filtered out before indexing.
Note - The ability of Mulgara to correctly identify HTML input is limited, and only works when fetching a resource via HTTP from a web server that accurately reports the content type.
Use the delete
command to delete text from full-text models.
Querying Full-Text Models
Queries on full-text models work differently on normal models, as follows:
- The
where
clause must have a literal-valued object. - The object portion of the
where
clause is passed to the underlying search engine as a pattern. The following types of pattern searches are possible:- Wildcards
- Fuzzy
- Word proximity
- Boosting a term
- Boolean operators (and, or, not, +, -)
For more information on Lucene searching, see the Lucene query syntax.
- If the
$score
variable is specified in theselect
clause, it is assigned a number from 0 to 1 indicating how close the match was. The$score
variable must be part of theselect
clause in order to see it in the result.
Given the full-text model #foo
populated in the previous section, the following query returns titles with the words "duty" and "care" in the title, as well as an indication of the quality of the match:
select $url $title $score
from <rmi://mysite.com/server1#data>
where $url <http://mulgara.org/mulgara/Document#title> $title
and $url <http://mulgara.org/mulgara/Document#title> '+duty +care'
in <rmi://mysite.com/server1#foo>;
Note - In the example above, the join is performed across the $url
column, and the second where
constraint is only executed against the full-text model #foo.
The index files for full-text models are stored in the server1/fulltext
directory.
Limitations of Full-Text Models
Full-text models are an attempt to make a text index act like a Mulgara model, allowing both exact and partial matching to be mixed within queries. Full-text models have the following limitations: