Algolia: A Fast Full Text Search for Sitecore

Daniil Raschupkin on September 13, 2019

See code here

Sitecore extensively uses full text search and supports several search engines out of the box Some of them are essential for Sitecore operation (like Solr or Azure Search). Some can be integrated and used only for client side search (Coveo is the most popular among them). However, in the cases when a customer has specific requirements to search, the existing solutions will not fit. In these cases you will do a custom search engine integration.

There are plenty of alternative search engines (Epoq, Algolia, SearchBlox, Cludo, Amazon CloudSearch, SwiftType, Sajari, Prediggo etc), which operate in the cloud and on-premise, with different capabilities and pricing options. What’s common for them, is that they provide more business value for your customer and that is why we need them.

I had a chance to work with Algolia search engine and can say safely that it’s a great tool for building full text search. Key Algolia search features are:

  • instant search
  • global language support
  • typo-tolerance
  • highlighting
  • faceting
  • synonyms
  • geo-awareness
  • personalization
  • analytics

Algolia also provides a .NET client library for indexing and a JS library which allows building rich search UI with all modern search features. The most important thing is that Algolia search works really fast and delivers search results within milliseconds.

There are two things a developer needs to do - index data on backend and enable search UI on the frontend. A front-end developer should start with reading “Building search UI” section of Algolia’s manuals, which are very detailed.

A backend developer will need to come up with his own solution find a sample solution here. He needs to find answers to several important questions about indexing data:

  • What to index?
  • When to index?
  • How to delete?
  • How to update?

We need some kind of a crawler that will extract data from specific items following certain rules. We also need to include paths, exclude templates, convert Sitecore field values.

Finally, we need to decide when we need to react on a data change, whether it is crucial to have indexes up to date, as well as, which Sitecore events, agents, tasks we need in order to get the best possible index update process. We should also prepare for handling item deletion and collecting multiple small updates into batches.

Usually, deletion is more important for user experience - it’s ok if a page cannot be found right after publishing, but it is not acceptable to have broken links in search results.

Most likely, you will need to search in multiple languages and Algolia recommends to split your data into indexes per language to optimize index build time.

I want to share a simple example of the Algolia indexing implementation, based on:

  • A couple of Sitecore events as indexing triggers (publish:itemProcessed and item:deleted) (link)
  • Several Sitecore commands for manual actions (update index/reindex tree) (link)
  • Flexible custom crawler that will handle sitecore items according to configuration (link)
  • Indexing queue to form batches from a stream of small updates (link)

It will take some time and effort to build the indexing process exactly as needed for the customer, but the result will be worth it.