Sitecore Cortex and ML: Part 1 - How to Create Sitecore Processing Engine Service

Sitecore Processing Engine is a new ASP.Net Core based job of xConnect that works as a Windows service (or Azure Job). In order to run our custom code inside processing engine we need to make quite a number of configuration changes, register our custom types, inject our services, figure out how to access to xdb, etc. In this post we will review the basics of coding for Sitecore host applications (it applies to all xConnect jobs: processing engine, automation engine and indexing worker).

For our demo scenario we create custom MLNetService that will be a layer between Processing engine and Machine Learning engine.

public interface IMLNetService
 {
    Task<ModelStatistics> Train(string schemaName, CancellationToken cancellationToken, params TableDefinition[] tables);
        Task<IReadOnlyList<object>> Evaluate(string schemaName, CancellationToken cancellationToken, params TableDefinition[] tables);
 }

public MLNetService : IMLNetService 
{
 public async Task<ModelStatistics> Train(string schemaName, CancellationToken cancellationToken, params TableDefinition[] tables)
        {
            throw new NotImplementedException();
        }
public async Task<IReadOnlyList<object>> Evaluate(string schemaName, CancellationToken cancellationToken, params TableDefinition[] tables) {
            throw new NotImplementedException();
        }
}

Note: some namespaces used in the above code are located inside libraries that exist only in Processing Engine instance, you can find them here: “xconnect_instance\App_Data\jobs\continuous\ProcessingEngine\”. Such libraries contain “*.ML.*” in their names (for example Sitecore.Processing.Engine.ML.Abstractions.dll)

So how do we register custom service is Sitecore processing engine?

Let’s see how dependency injection works in process engine and how we can pass parameters from configuration into our service:

To register our service we need to add the following xml config to “xconnect_instance\App_Data\jobs\continuous\ProcessingEngine\App_Data\Config\” folder.

<Settings>
    <Sitecore>
        <Processing>
            <Services>
                <IMLNetService>
<Type>Demo.Foundation.ProcessingEngine.Services.MLNetService, Demo.Foundation.ProcessingEngine</Type>
                    <As>Demo.Foundation.ProcessingEngine.Services.IMLNetService, Demo.Foundation.ProcessingEngine</As>
                    <LifeTime>Transient</LifeTime>
                    <Options>
                       <TestValue>0.25</TestValue>
                    </Options>
                </IMLNetService>
            </Services>
       </Processing>
   </Sitecore>
</Settings>

Notice we just need to register our type in <Processing/Services> node, choose a lifetime (Scoped/Singleton/Transient), and also we can pass any parameters in <Options> node. To access these parameters in our service we need to inject Microsoft.Extensions.Configuration.IConfiguration and read these parameters (or even entire section):

public class MLNetService : IMLNetService
{
    private double _testValue;

    public MLNetService(IConfiguration configuration)
    {
         _testValue = configuration.GetValue<double>("TestValue");
    }
}

To have access to logger we need to inject generic Microsoft.Extensions.Logging.ILogger

public class MLNetService : IMLNetService
{
    private double _testValue;
    private readonly ILogger<MLNetService> _logger;
    public MLNetService(IConfiguration configuration, ILogger<MLNetService> logger)
    {
             _testValue = configuration.GetValue<double>("TestValue");
	     _logger = logger;
	     _logger.LogInformation("Hello MLNetService”);
    }
}

(Logs are written to “xconnect_instance\App_Data\jobs\continuous\ProcessingEngine\App_Data\Logs\” folder.)

To inject our MLNetService into another service we just need to include it in constructor parameters:

public class AnotherService
{
    private IMLNetService _mlNetService;
    public AnotherService(IMLNetService mlNetService)
    {
	_mlNetService  = mlNetService;
    }
}

Another way - to inject IServiceProvider and get our service from its scope:

public class AnotherService
{
    public AnotherService(IServiceProvider serviceProvider)
    {
        using (var scope = _serviceProvider.CreateScope())
        {
            using (var service = scope.ServiceProvider.GetService<IMLNetService>())
            {
        	. . .
            }
        }
    }
}

To access to xdb context we can just retrieve IXdbContext service from ServiceProvider:

using (var scope = _serviceProvider.CreateScope())
{
	using (var xdbContext= scope.ServiceProvider.GetService<IXdbContext>())
	
    {
	var contact = await xdbContext.GetContactAsync(...);
    }
}

How can we debug our code?

Because of processing engine works as a Windows service we need to attach to Sitecore.ProcessingEngine service. To do this, we need to configure debugging in Visual Studio: open menu bar, choose Tools > Options. In the Options dialog box, choose Debugging > Symbols, select the Microsoft Symbol Servers check box.

configure debugging in Visual Studio

Then on the menu bar select Attach to Process from the Debug or Tools menu, in Processes dialog box select the “Show processes from all users” checkbox and find Sitecore.ProcessingEngine service.

Attach debug tool in VS

Now you can catch breakpoints and debug your code:

catch breakpoints in code

But processing engine tasks run by the agent and it is not comfortable to wait for debugging while sleep period will be ended. As a workaround, you can create WebApi where you can trigger tasks registration and your task will start immediately. You just need to send request to your API (with Postman, browser console or any other tool):

public class TestСontroller: ApiController
    {
        public async Task<Guid> RegisterTasks()
        {
            var taskManager = ServiceLocator.ServiceProvider.GetService<ITaskManager>();
            ...
            var guid = await taskManager.RegisterModelTrainingTaskChainAsync(modelTrainingOptions, dataSourceOptions, TimeSpan.FromHours(1));
            
            return guid;
        }
    }

(More information about tasks registration you can find here “Part 4 - Workers, Options dictionary, Agents and Task Manager”(coming soon))

In response, you will see the ID of your task and you can check it in ProcessingEngineTasks database.

ProcessingEngineTasks database

You can also check tasks execution processes in processing engine log. It look like:

[Information] Registered Distributed Processing Task, TaskId: 8faf6763-c623-45b7-a987-f08c9efc71d9, Worker: Sitecore.Processing.Engine.ML.Workers.ProjectionWorker`1[[Sitecore.XConnect.Interaction, Sitecore.XConnect, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null]], Sitecore.Processing.Engine.ML, DataSource: Sitecore.Processing.Engine.DataSources.DataExtraction.InteractionDataSource, Sitecore.Processing.Engine
[Information] Registered Deferred Processing Task, Id: c27c8eaa-efb2-42c5-b638-5357f76c3460, Worker: Sitecore.Processing.Engine.ML.Workers.MergeWorker, Sitecore.Processing.Engine.ML
[Information] Registered Deferred Processing Task, Id: bc344b9f-f22a-4074-bcef-76578d85a045, Worker: Demo.Foundation.ProcessingEngine.Workers.RfmTrainingWorker, Demo.Foundation.ProcessingEngine
[Information] TaskAgent Executing worker. Machine: BRIMIT-SBA-PC, Process: 4132, AgentId: 4, TaskId: 8faf6763-c623-45b7-a987-f08c9efc71d9, TaskType: DistributedProcessing.
[Information] TaskAgent Worker execution completed. Machine: BRIMIT-SBA-PC, Process: 4132, AgentId: 4, TaskId: 8faf6763-c623-45b7-a987-f08c9efc71d9, TaskType: DistributedProcessing.
[Information] TaskAgent Executing worker. Machine: BRIMIT-SBA-PC, Process: 4132, AgentId: 4, TaskId: c27c8eaa-efb2-42c5-b638-5357f76c3460, TaskType: DeferredAction.
[Information] TaskAgent Worker execution completed. Machine: BRIMIT-SBA-PC, Process: 4132, AgentId: 4, TaskId: c27c8eaa-efb2-42c5-b638-5357f76c3460, TaskType: DeferredAction.
[Information] TaskAgent Executing worker. Machine: BRIMIT-SBA-PC, Process: 4132, AgentId: 4, TaskId: bc344b9f-f22a-4074-bcef-76578d85a045, TaskType: DeferredAction.
[Information] RfmTrainingWorker.RunAsync

This post outlines some basic information you should know when working with Sitecore Processing Engine. Some additional information you can find in the official Sitecore documentation about “Sitecore Host”

Table of contents Dive into Sitecore Cortex and Machine Learning - Introduction

Read next Part 2 - Adding custom events, facets and models


Do you need help with your Sitecore project?
VIEW SITECORE SERVICES