Sitecore Content Tagging: Extracting Content from a Datasource's Rendering

Sergey Baranov on December 6, 2020

Sitecore Cortex: Extend Content Tagging

If you have experience with Sitecore Content Tagging feature, you should know that content tagging applies only to itseft item`s content.

How Content Tagging retrieve items content:

  1. Get all item fields that have "Multi-Line Text" or "Rich Text" type.
  2. Filter standart fields (to prevent exctracting theirs content).
  3. Exctracts text contect from these result fields and combines them.

Very often you have items in Sitecore has no much content in their own fields, but consist of many renderings with datasources, like following:

Items presentation

And if you want to tag item like this including all rendering datasorces (that have significant to item contect), then you will not succeed. Let`s implement our own extention to achieve our goals.

First of all we need to implement some item extentions that we will use to retrieve item renderings and their datasources:

      
using System;
using System.Collections.Generic;
using Sitecore.Data;
using Sitecore.Data.Items;
using Sitecore.Layouts;

namespace Foundation.ContentTagging.Extentions
{
    public static  class ItemExtentions
    {
        public static List<Item> GetDataSourceItems(this Item i)
        {
            List<Item> list = new List<Item>();
            foreach (var reference in i.GetRenderingReferences())
            {
                Item dataSourceItem = reference.GetDataSourceItem();
                if (dataSourceItem != null)
                {
                    list.Add(dataSourceItem);
                }
            }
            return list;
        }

        public static Item GetDataSourceItem(this RenderingReference reference)
        {
            return reference != null ? GetDataSourceItem(reference.Settings.DataSource, reference.Database) : null;
        }

        private static Item GetDataSourceItem(string id, Database db)
        {
            return Guid.TryParse(id, out var itemId)
                ? db.GetItem(new ID(itemId))
                : db.GetItem(id);
        }

        public static RenderingReference[] GetRenderingReferences(this Item i)
        {
            if (i == null)
            {
                return new RenderingReference[0];
            }
            // default sitecore Device
            var defaultDeviceItem = i.Database.Resources.Devices[new ID("{FE5D7FDF-89C0-4D99-9AA3-B5FBD009C9F3}")];
            return i.Visualization.GetRenderings(defaultDeviceItem, false);
        }
    }
}

Next we need to implement our custom processor for tagContent pipeline that will retrieve text content from our rendering datasources and add it to result tagging content list:

      
using System;
using System.Collections.Generic;
using System.Linq;
using Foundation.ContentTagging.Extentions;
using Sitecore.Data.Items;
using Sitecore.ContentTagging.Core.Models;
using Sitecore.ContentTagging.Core.Providers;
using Sitecore.ContentTagging.Pipelines.TagContent;

namespace Foundation.ContentTagging.Pipelines.TagContent
{
    public class RetrieveContentFromDatasourceItems
    {
        public void Process(TagContentArgs args)
        {
            var taggableContentList = new List<TaggableContent>();
            if (args.Content != null && args.Content.Any())
            {
                taggableContentList.AddRange(args.Content);
            }

            foreach (IContentProvider<Item> contentProvider in args.Configuration.ContentProviders)
            {
                var dataSoutceItems = args.ContentItem.GetDataSourceItems();
                foreach (var item in dataSoutceItems)
                {
                    var content = (StringContent)contentProvider.GetContent(item);

                    if (!string.IsNullOrEmpty(content.Content) && !string.IsNullOrEmpty(content.Content.Trim()))
                    {
                        taggableContentList.Add(content);
                    }
                }

            }
            args.Content = taggableContentList;
        }
    }
}

And latest, we need to patch tagContent pipeline to add our custom processor right after default Sitecore.ContentTagging.Pipelines.TagContent.RetrieveContent:

      
<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:search="http://www.sitecore.net/xmlconfig/search/">
    <sitecore role:require="Standalone or ContentManagement">

        <pipelines>
            <group name="ContentTagging" groupName="ContentTagging">
                <pipelines>
                    <tagContent>
                        <!-- Extract text content from all datasource items that are used in item presentations, remove it if you don`t need -->
                        <processor type="Foundation.ContentTagging.Pipelines.TagContent.RetrieveContentFromDatasourceItems, Foundation.ContentTagging"
                                   patch:after="processor[@type='Sitecore.ContentTagging.Pipelines.TagContent.RetrieveContent, Sitecore.ContentTagging']"
                                   resolve="true" />
                    </tagContent>
                </pipelines>
            </group>
        </pipelines>
    </sitecore>
</configuration>

Let`s test it!

  • Create an item with text that we will use as a datasource: 
  • Create testing item without any text data and set datasource item in any rendering in presentation details: 
  • Run "Tag item" command from ribbon menu and see result : 

Finally, if you need more field types that will used for content tagging, for example if you want to include "Single-Line Text" fields to processing, you can patch contentTagging section like following:

      
<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/">
  <sitecore role:require="Standalone or ContentManagement">
    <contentTagging>
    <fieldMap>
        <fieldTypes>
            <fieldType fieldTypeName="Single-Line Text"/>
        </fieldTypes>
    </fieldMap>
</contentTagging>
</sitecore>
</configuration>

UPDATE: Source code is available on GitHub.