Skip to content

ETL

The goal of this example is to introduce you to defining and composing functions of different kinds: Runtime, different dependency packing mechanisms. It is also to introduce us to tc’s sophisticated builder.

The pipeline

  1. The pipeline has 3 functions - Enhancer, Transformer and Loader.
  2. Notify on completion of the ETL process
  3. Host a simple html page to show the notifications.

If you were to ask an architect or Chat GPT, we may get an architecture something like this:

Etl image

Can we define this topology at a high-level without knowing anything about the underlying services ? It’s possible with tc. Let’s break it down and incrementally design the topology:

topology.yml
name: etl
routes:
/api/etl:
method: POST
async: true
function: enhancer

The enhancer function implementation is irrelevant for the sake of this example. It could be written in Ruby, Python, Clojure or Rust. Let’s assume there is a function directory called enhancer with some piece of code.

Let’s create a sandbox called yoda with dev AWS_PROFILE.

Terminal window
tc create -s yoda -e dev
topology.yml
name: etl
routes:
/api/etl:
method: POST
queue: ETLQueue
queues:
ETLQueue:
function: enhancer

Queue may not be strictly necessary. However, in this context, the requirement is to preserve the order of requests. By default, tc configures a FIFO queue.

Now update our yoda sandbox

Terminal window
tc update -s yoda -e dev
topology.yml
name: etl
routes:
/api/etl:
method: POST
queue: ETLQueue
queues:
ETLQueue:
function: processor
functions:
enhancer:
root: true
function: transformer
transformer:
function: loader

The enhancer, transformer and loader functions need not have any input/output transformation code. By specifying a DAG of functions above, tc composes and generates the ASL required for stepfunctions. We can inspect the generated ASL by running.

Terminal window
tc compose -s states -f yaml
Terminal window
tc update -s yoda -e dev
topology.yml
name: etl
routes:
/api/etl:
method: POST
queue: ETLQueue
queues:
ETLQueue:
function: enhancer
functions:
enhancer:
root: true
uri: ./enhancer
function: transformer
transformer:
function: loader
event: ProcessedMessage
events:
ProcessedMessage:
channel: etl-notifications
channels:
etl-notifications:
authorizer: default

The loader function emits an event ProcessedMessage. tc takes care of structuring the payload and using the right event pattern. Eventually, the event triggers a websocket notification via channels - Appsync event channel.

Terminal window
tc update -s yoda -e dev
topology.yml
name: etl
routes:
/api/etl:
method: POST
queue: ETLQueue
queues:
ETLQueue:
function: enhancer
functions:
enhancer:
root: true
uri: ./enhancer
function: transformer
transformer:
function: loader
event: ProcessedMessage
events:
ProcessedMessage:
channel: etl-notifications
channels:
etl-notifications:
authorizer: default
pages:
app:
dist: .
dir: webapp

Here we have it! With just few lines of abstract definition, we got an end-to-end ETL pipeline working with almost no infrastructure code.