Monday, October 29, 2007

tutorial: advanced lucene.NET usage example

this post applies to Sitecore 5.3.1

After the previous post I wrote about the lucene.net search implementation I've had tons of questions about the search and indexes and almost everything inbetween, so I thought I'd make another post on this subject.. this post however uses more of the functionality that's already available and another, more advanced, approach to searching and displaying the results.

(a lot of this code is based on/taken from the simplesearch implemented in sitecore)

what you'll end up with is a sublayout somewhat similar to the following screenshot that you can use on your website(s):





lucence.net advanced search sublayout


this tutorial will cover the following steps:

  • create a new custom index that indexes data from the web database based on a certain template and indexes selected fields

  • create a sublayout that uses the index to search and render output
Step 1: Create the index
Add the following to the web.config file within the <indexes> section:

<!-- Custom Web Index (created as an example) -->
<index id="webindex" singleInstance="true" type="Sitecore.Data.Indexing.Index, Sitecore.Kernel">
<param desc="name">$(id)</param>
<templates hint="list:AddTemplate">
<template>Sample Item</template>
</templates>
<fields hint="raw:AddField">
<field>title</field>
<field storage="unstored">text</field>
</fields>
</index>

Next, locate the definition for the Web database (within the <databases> section) and add the following to that definition after the proxydataprovider one:


<indexes hint="list:AddIndex">
<index path="indexes/index[@id='webindex']" />
</indexes>
<Engines.HistoryEngine.Storage>
<obj type="Sitecore.Data.$(database).$(database)HistoryStorage, Sitecore.$(database)">
<param desc="connection" ref="connections/$(id)">
</param>
<EntryLifeTime>30.00:00:00</EntryLifeTime>
</obj>
</Engines.HistoryEngine.Storage>

Step 2: Create the sublayout

Create a new sublayout/usercontrol and add the following elements:

  • SearchTextBox - Textbox
  • SearchButton - Button
  • SearchResultsPanel - Panel
  • lblStatus - Label
Now, hook up the click event of the button to do the following (note that "webindex" & "web" defines the indexname and database to search in):
AdvancedSearch(SearchTextBox.Text, "webindex", "web");
and here's the code for the AdvancedSearch() method:

/// <summary>
/// Searches for a specified string using the built-in lucene.net engine
/// with advanced functionality as like the one seen in sitecore when
/// performing a search..
/// </summary>
/// <param name="searchstring">the string to search for</param>
/// <param name="indexname">the name of the index</param>
/// <param name="database">the database to perform the search within</param>
private void AdvancedSearch(string searchstring, string indexname, string database)
{
try
{
// clear output holders..
this.SearchResultsPanel.Controls.Clear();
this.lblStatus.Text = "";

// make sure we don't do unwanted empty searches..
if (SearchTextBox.Text == string.Empty)
{
this.lblStatus.Text = "please specify your search..";
return;
}

// find the proper culture when comparing later..
System.Globalization.CultureInfo culture = Sitecore.Context.Culture;
if (culture.IsNeutralCulture)
{
culture = System.Globalization.CultureInfo.CreateSpecificCulture(culture.Name);
}

// timer to use when calculating time taken
HighResTimer timer = new HighResTimer(true);

// get the specified index
Index searchIndex = Sitecore.Configuration.Factory.GetIndex(indexname);
// get the database to perform the search in..
Database db = Sitecore.Configuration.Factory.GetDatabase(database);
// get a designated indexsearcher that exposes more functionality..
IndexSearcher searcher = searchIndex.GetSearcher(db);
// get a new standard analyser so we can create a query..
Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer();
Query query = Lucene.Net.QueryParsers.QueryParser.Parse(searchstring, "_content", analyzer);
// perform the search and get the results back as a Hits list..
Hits hits = searcher.Search(query);
// final timer for calculating time taken
double timeElapsed = timer.Elapsed();

// output friendly message about how many hits, time taken etc..
this.lblStatus.Text = string.Format(Sitecore.Globalization.Translate.Text("Found {0} {1} that matched query '{2}' ({3}{4})"), new object[] { hits.Length(), (hits.Length() == 1) ? Sitecore.Globalization.Translate.Text("document") : Sitecore.Globalization.Translate.Text("documents"), searchstring, timeElapsed.ToString("0.00"), Sitecore.Globalization.Translate.Text(" ms") });

// new stringbuilder that we'll be adding the content to prior to final output
StringBuilder sb = new StringBuilder();
// a new highlighter that gives us some abstract text of the item with the hits highlighted
Highlighter highlighter = new Highlighter(new QueryScorer(query));

// step through each result and format it before returning it to the client
for (int i = 0; i < hits.Length(); i++)
{
// get the actual item
Item itm = Index.GetItem(hits.Doc(i), db);
if (itm != null)
{
string retStr = string.Empty;
// get all the fields of the item..
Sitecore.Collections.FieldCollection fields = itm.Fields;
// .. and step through them so we'll be able to show where the hit was found
for (int j = 0; j < fields.Count; j++)
{
Sitecore.Data.Fields.Field field = itm.Fields[j];
if (field != null)
{
string fieldname = field.DisplayName;
if (string.IsNullOrEmpty(fieldname))
{
fieldname = Sitecore.Globalization.Translate.Text("[Unknown field]");
}
string s = StringUtil.RemoveTags(field.Value);
TokenStream tokenStream = analyzer.TokenStream(new System.IO.StringReader(s));
// use the highlighter to try and get highlighted hit in the text
string highlightedText = highlighter.GetBestFragments(tokenStream, s, 3, "...");
string formattedOutput = retStr;
if (highlightedText.Length > 0)
{
retStr = formattedOutput + "<div><span class=\"scField\">" + fieldname + ":</span> \"" + highlightedText + "\"</div>";
}
else if (s.IndexOf(searchstring, StringComparison.CurrentCultureIgnoreCase) >= 0)
{
retStr = formattedOutput + "<div><span class=\"scField\">" + fieldname + ":</span> \"" + StringUtil.Clip(s, 0x40, true) + "\"</div>";
}
}
}
string updated = itm.Statistics.Updated.ToString("d", culture);
string nameandversion = itm.Language.CultureInfo.DisplayName + ", " + itm.Version;
sb.Append("<div style=\"padding:8px 0px 8px 0px\"><a href=\"" + itm.Paths.GetFriendlyUrl(true) + "\" class=\"scResult\">" + Sitecore.Resources.Images.GetImage(itm.Appearance.Icon, 0x10, 0x10, "absmiddle", "0px 4px 0px 0px") + itm.DisplayName + "</a><br/>" + retStr + "<div class=\"scResultInfo\">" + itm.Paths.Path + "[" + nameandversion + "] - " + updated + "</div></div>");
}
else
{
sb.Append("<div class=\"scNotFound\" style=\"padding:8px 0px 8px 0px\">" + Sitecore.Resources.Images.GetImage("Applications/16x16/error.png", 0x10, 0x10, "absmiddle", "0px 4px 0px 0px") + Sitecore.Globalization.Translate.Text("Item not found") + "</div>");
}
}
this.SearchResultsPanel.Controls.Add(new LiteralControl(sb.ToString()));
searcher.Close();
}
catch (Exception exception)
{
this.SearchResultsPanel.Controls.Add(new LiteralControl(exception.Message));
}
}

If things go wrong: make sure you have set up the index correctly and that the index is created in the /indexes folder of your installation. to manually trigger the reindexing go via the databases option in the control panel in sitecore (sitecore menu > control panel).

the full source code for this post and the previous post will be made available here later on, for now you can email or comment if you want the details of the code sent to you..

feel free to comment or email if you have any ideas or questions.

Regards,

P.

14 comments:

Anonymous said...

Hi Peter,

I used your code. But it always show "0 results found". Am I missing anything.
Thanks.

Unknown said...

should work just fine, but you'll have to make sure that your installation runs the indexes (see final paragraph: if things go wrong).

P.

ashish.iub said...

Hi Peter,

Its really an interesting article.

I have used the nhibernate mapping with XML file.But i am little confused about the indexing.Where actually i have to put the code in my web config file.Can you please clarify me more with a sample config file.

Thanking you in advance.

ashish.iub said...

Reply me in this mail address...

Unknown said...

Ashish:

i can't seem to find your email (not allowed to read your profile) so i'll answer here instead, hope that's ok.

the entire article assumes you're using Sitecore and their implementation of lucene.NET, so i'll assume that too :)

the new index definition that you're creating should go inside the indexes section in the web.config, that's the section where the indexes are all defined.. the hook to the database is done in the databases section in the web.config..

regards,

Peter

ashish.iub said...

M really sorry that i forgot to give my email id :).

Its, ashish.iub@gmail.com.

Actually i am using the nhibernate with lucen.net for my asp.net application.

Planning to develop search engine ,to pick up the entire entity from database. For that i am hunting down the net. Still searching for a guideline. I read your article. Thought that you may assist me up to do my part. If you have any idea regarding my problem.Mail me.

Thanking you

Ashish

Anonymous said...

hi, you can post this example in a package(includes database and sourse code), please

Anonymous said...

Hi Peter,

I know this is a old post but I was wondering if there is a way to index under a specific folder? For example we only want to search under the content of a specific site in the content tree. Is this possible? If we cannot index like that is there any other way to do this?

Another question I have is that for filtering by the template index do we have to use "GUID" or the the template name?

Thanks
Said

jizhiunion said...

hi! peter
Can you give me the full code please?
My Email address:zhengchunsky@gmail.com

Phani said...

hi peter,

I am doing a job portal in asp.net and I tried your code, and I wrote code in web.config.It is not supporting the tag indexes

where i have to apply this code

Please tell me ....
my eMailId is mailtophani2@gmail.com
Thanking You,
Phani.

Anonymous said...

Hi Peter, can you please send a copy of the code to:
stockboy3000 @ hotmail.com

Thanks
Jonas

Anonymous said...

Hello peter, i am stuck in doing my assignment , it would help a great deal if u can zipped the whole code and mail it to my email id i.e. fahad_ashraf123@yahoo.com

Thanks in Advance!!!

r4 games said...

try
{
// clear output holders..
this.SearchResultsPanel.Controls.Clear();
this.lblStatus.Text = "";

// make sure we don't do unwanted empty searches..
if (SearchTextBox.Text == string.Empty)
{
this.lblStatus.Text = "please specify your search..";



here in the last line it generates exception......It just cannot assign value.......

Anonymous said...

In what namespace is the Highlighter class?