Engine

SHAR Engine Architecture

Engine message processing

SHAR processes each NATS message in the WORKFLOW stream WORKFLOW.>. The engine contains several message processors, and each of them deal with a specific message type such as state transitions and activity execution.

If an error occurs whilst processing a message, the error type defines whether the step should be retried, the activity aborted, or the workflow terminated.

The following example describes the way SHAR processes an activity. In this case SHAR is running a service task on a client that has previously registered that it can perform task y.

sequenceDiagram
    autonumber
    participant SHAR
    participant NATS
    participant Client
    NATS--)SHAR: WORKFLOW.x.State.Traversal.Execute
    activate SHAR
    SHAR--)NATS: WORKFLOW.x.State.Activity.Execute
    SHAR--)NATS: WORKFLOW.x.State.Traversal.Complete
    note over SHAR: Locate the activity in the WORKFLOW_DEF KV.
This activity is a Service Task type. 
 Export any variables the task needs.
    note over SHAR: Store state snapshot in
WORKFLOW_VARSTATE KV.
    SHAR--)NATS: WORKFLOW.x.State.Job.Execute.ServiceTask.y
    deactivate SHAR
    NATS--)Client: WORKFLOW.x.State.Job.Execute.ServiceTask.y
    activate Client
    note over Client: The client performs any processing
using the provided workflow variables
and returns result variables.
    Client--)NATS: WORKFLOW.x.State.Job.Complete.ServiceTask
    deactivate Client
    NATS--)SHAR: WORKFLOW.x.State.Job.Complete.ServiceTask
    activate SHAR
    note over SHAR: The Service Task completed successfully. 
 Merge any variables back in the workflow state
from the WORKFLOW_VARSTATE KV.
    SHAR--)NATS: WORKFLOW.x.State.Activity.Complete
    deactivate SHAR
    NATS--)SHAR: WORKFLOW.x.State.Activity.Complete
    activate SHAR
    note over SHAR: Locate the next traversals in the WORKFLOW_DEF KV.
    loop For each traversal
        opt If traversal condition is met
            SHAR--)NATS: WORKFLOW.x.State.Traversal.Execute
        end
    end
    deactivate SHAR

NATS messages

NATS messages are used to trigger state-machine activities and transitions as seen above.

The same NATS messages are used by the engine for performing housekeeping tasks such as clearing up the key/value store.

These messages can also be used for extensibility. An example of this is the SHAR Telemetry Server which listens to workflow and activity messages, and converts them into Jaeger spans for tracing.

NATS Message boundaries

It is to be expected that the engine’s host may terminate abruptly during execution. The engine seeks to mitigate the effects of this by starting each critical piece of functionality using a message.

NATS by default will retry delivery of a message if is not acknowledged within the timeout period.

If the engine terminates during execution of a critical section, then the triggering message will be resent to another SHAR instance to be re-processed.

SHAR has been written in such a way that the follow-on NATS message is sent before the previous NATS message is acknowledged. This ensures that the workflow stays live even during NATS down, or SHAR termination.

It is imperative that all critical section code is idempotent i.e. it can be re-entered with the same parameters without causing side effects. Code not designed this way could possibly execute processes tasks and activities multiple times!