Delayed Event Triggers with AWS – EventBridge Scheduler vs Step Functions

Triggering an event at a specific time is a common requirement for many software applications. Also known as sending timer based events, delayed event triggers allow applications to delay the invocation of an event until a predetermined time. In the world of AWS, there are two key ways to accomplish this task both with different pros and cons and slightly different, and sometimes unexpected, behaviour. 

This article reviews the importance of timer based events and is followed by discussing the two ways to trigger these timers using AWS services.

The Importance of Delayed Events

Many use cases, from business use cases to system level actions, can benefit from delayed event timers. Take for example an e-commerce based application that takes customer orders. A common business requirement is to follow up a customer order with an email coupon in the weeks that follow. The pseudocode for calculating the timer invocation time could be:

Timer invocation time = Order Placement Time + Time Offset (e.g. 7 days)

In a single-process program, this task is trivial–we can use sleep functions to accomplish this in a single line of code. But in cloud based applications that are decoupled and distributed in nature, this approach is infeasible. Instead, we need to leverage an AWS service that provides this functionality. 

Currently, there are two primary ways to trigger delayed events in AWS: EventBridge Scheduler, and Step Functions. Both of these options have a set of pros and cons in the categories of accuracy, cost, and ease of use. In the remaining sections of this article, we walk through each of these options, starting with EventBridge Scheduler. Towards the end, I’ll provide summarized advice to help you choose one service over the other.

Delayed Event Timers with EventBridge Scheduler

Eventbridge Scheduler is an enhancement of the existing EventBridge Service. Launched at re:invent in 2022, the new feature offers one of two functions: triggering recurring events at a specified rate interval/cron schedule, or more useful for our problem, triggering events at a predetermined time in the future

At first glance, this service seems like exactly what we want. However there are some pros and cons to be aware of, mostly in terms of timer accuracy and granularity. 

Pros

  1. Requires no infrastructure. Simply use the Scheduler create_schedule API to to schedule an event in the future.
  2. Supports triggering numerous AWS services including Lambda, Step Functions, SNS, SQS, and many more.
  3. Supports Flexible Time Windows, aka jitter, at trigger time to prevent the thundering herd or stampede side effect. 
  4. The new ActionAfterCompletion feature allows you to delete the timer instance after it fires, allowing you to stay under the 1,000,000 instance limit.
  5. “Almost Free” pricing model.

Cons

  1. Supports only 1,000,000 timer instances at a time.
  2. Auditability is an issue–it’s difficult to link the timer instance with the corresponding event invocation without manually passing along a contrived eventId. 
  3. Requires confusing IAM Role Setups (see this article for more). 
  4. Only supports Day, Hour, and Minute granularities, not seconds.
  5. Sub-par accuracy. After running manual tests on 50 timer invocations, the invocation time was on average off by over 30 seconds. In other words, if you scheduled an event for 05:00:00, it would more likely trigger at 05:30:00. 

The last con can be a deal breaker for many applications that require prompt triggering of their event. If this is you, you should instead consider using Step Functions (discussed below) which offer much more accurate timer invocation accuracy and better granularity.

Pricing

The pricing model of EventBridge Scheduler is one its most attractive features. Included in the free tier is 14 million free invocations (a staggering amount, I must say). After that, you’re only charged $1.00 per million scheduled invocations per month. 

As a pricing example, say we have 20 million invocations, you would be charged:

20 million – 14 million (free tier) x $1.00 = $6.00 per month

For most use cases, EventBridge Scheduler is just fine. But for others requiring more accurate timer invocations (within the second, on average), and more AWS service integrations, this next option using Step Functions is worth it to consider. 

Delayed Event Timers with Step Functions

Step Functions are more typically used in Workflow type applications requiring the coordination between multiple AWS services, so it’s a bit odd to use as a method for delayed event timers. However, one of the features offered within Step Functions is the Wait State. The Wait State allows you to pause your step function workflow for either a period of time (relative) or until an absolute time (pre-defined). The latter is clearly more beneficial for our use case.

To make Delayed Event Timers work with Step Functions, you would need to create a Workflow (aka State Machine), and add the Wait State, followed by the corresponding event trigger you’d like to execute when the timer fires. Here is what it would look like in conjunction with a Lambda Function.

A Step Function with a Wait State and Lambda Trigger.

You can also set up your state machine to take as input a timestamp field which is then passed in to the wait task telling it at what time to resume the step function and perform the next series of tasks.

The nice thing about using Step Functions in general is that it allows you to trigger multiple serial or parallel tasks after the trigger of the event without requiring any additional service overhead. For example, I can easily change our timer workflow to trigger a Lambda Invocation, Dynamo PutItem Request, and SQS SendMessage all at once and in parallel.

I’m a huge fan of Step Functions and have used them quite a bit in my professional career as a Senior SDE. I even went so far to create a Masterclass course that you can check out here

To summarise the main pros and cons of step functions:

Pros

  1. Much more accurate than EventBridge Scheduler. In a baseline test, I observed an average delta between timer start time and actual invocation time to be  approximately .57 seconds. This makes Step Functions a much better choice for sub-second timer accuracy.
  2. Granularity in the order of seconds compared to minutes. 
  3. You get to leverage the additional benefits of Step Functions, including adding subsequent serial or parallel tasks after timer invocation, and added features like error handling and monitoring. 
  4. The service integrations for triggers are much more plentiful, including hundreds of different services and APIs.

Cons

  1. Requires you to create a workflow (state machine) in advance, and configure it with a Wait State that reads off the input, and a trigger to the corresponding service.
  2. Much more expensive than EventBridge Scheduler (see pricing section below). 

Standard Mode Pricing

Let’s assume we have 1 million timer instances in order to calculate pricing. Step Functions pricing model is based on the number of State Machine transitions. For a simple Wait State with a Lambda Invocation, this amounts to 4 transitions–the state machine start event, the wait event, the lambda invocation event, and the state machine end event. 

Using this as our baseline, and factoring in the cost per transition plus the 4,000 state transitions we get for free as part of the free tier, the total price for 1 million timers is calculated below:

(1 million times x 4 state transitions – 4,000 free transitions) x 0.000025 USD per state transition = $99.90 USD

And for 20 million timer instances as we used in the EventBridge Scheduler exercise, the grand total comes out to a staggering $1,999.90 USD per month!!! This in comparison to $6.00 with EventBridge Scheduler. Clearly, you can see that the pricing difference is substantial between these two services. Now depending on the volume of timer instances you’re creating per month, this may be totally fine or a non-starter.

Do keep in mind that this pricing was calculated using the Standard Mode of Step Functions. Step Functions does offer an alternative execution mode, albeit limited, called Express Mode that costs substantially less, as described below. 

Express Mode Pricing

With Express mode, the pricing model is quite different but so are the execution guarantees and maximum workflow duration. The Standard Mode supports a maximum duration of 1 year per workflow. This means that you can set a Wait State that exists for up to 1 year in the future without an issue. The Express mode in contrast only supports 5 minutes maximum duration, so you can only set a timer for up to 5 minutes into the future. This may be a non-starter for some use cases, but just fine for others. If you do decide to go with Express Step Functions, you’ll notice the pricing is a bit lower. That’s because it’s based on the number of workflow requests, the duration of the workflow, and the amount of memory consumed per invocation.

Say for instance we have 1 million requests that last 60 seconds each and consume 64MB worth of memory, that would equate to $62.26 per month— a bit better than Standard, but not by much. 

Also keep in mind that Express Mode Step Functions offer looser execution guarantees, that being at-least once and at-most once execution depending on if you invoke your Step Function asynchronously or synchronously, respectively. This means that it is possible, although rare, for Step Functions to trigger your Wait State and followup task more than once. Standard Mode, in contrast, offers exactly once execution semantics, so this problem is non-existent.

Wrap Up

EventBridge Scheduler is a welcome addition to the AWS ecosystem offering a new way to schedule events. It excels at simplicity and cost effectiveness—offering a low complexity way to trigger timers at points in the future. Step Functions in combination with Wait States however offer a more feature-packed solution that allows you to link your timer with your event trigger, while also providing access to rich monitoring and debugging tools. Further, Step Functions offer lower levels of granularity (seconds compared to minutes) and in my testing are more accurate in firing events at almost the right time (plus minus a second or so). All of this doesn’t come for free however, as the cost difference between the two services is quite significant. 

The TLDR is to use EventBridge Scheduler if you value simplicity, low cost, minimal debugging tools, and are OK with minute level granularity (that happens to be pretty inaccurate). Use Step Functions if you need second level granularity, want to perform sequences of tasks in response to the trigger, and if you don’t really care about cost all that much (although I can’t imagine that to be the case for most). 

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts