Introduction to Scramjet Data Processing Platform

This article will introduce you to the key principles of our data processing platform. Our data apps engine has its own approach to creation, deployment and runtime of serverless applications; in many aspects different than solutions from domains such as software buses, integration suites and Function-as-a-Service offerings. We hope you will find it interesting.

"3 in 1" data processing platform

Our core runtime environment is called "Scramjet Transform Hub". It's available as a standalone software package and will be the core element of our Scramjet Cloud Platform.

We like to call our approach "3 in 1 data processing platform" as it combines 3 concepts into one solution:

  • data processing engine
  • serverless data applications
  • complete API with CLI (covering both I/O and management endpoints)

Scramjet data flow pattern

Let's dive into each point separately.

Data processing engine

Scramjet Transform Hub creates unified deployment, runtime, management and execution plain for serverless applications (sequences).

In short, STH allows you to start data processing in 3 simple steps:

1. Deploy

si sequence send <sequence-package-tar>

2. Run

si sequence run <sequence-id>

3. Send data

curl -H "Content-Type: application/octet-stream" --data-binary "@file.txt" <instance-input-endpoint>

You are free to post to our sequence simple HTTP request, file, send a stream or even read data from other stream or API.

Please notice that, contrary to typical microservices setup, there is no expensive step of building containerized image, pushing it to registry and then downloading it to container orchestrator to run the microservice. You can move from directory with code to sequence processing your data in less than a minute.

We do package our apps but their size is measured in kilobytes, not in hundreds of megabytes as in case of container images. Light app design gives better performance, optimized resources usage and simpler CI/CD process.

We do have a short, 3 minute demo, showing the whole package preparation, deployment and run process on our YouTube channel (see Scramjet Transform Hub - Intro demo)

Serverless data applications

We call the user applications sequences. They have capabilities to perform continues data and stream processing, they have no run time limits or input data size limits.

Each sequence has a straightforward structure - it's a directory with at least two core files:

  • package.json - simple JSON file describing sequence metadata
  • index.[js/ts] - JavaScript or TypeScript file with sequence code. You are free to structure your app in multiple files if you like.

Below there is content of one of our sample "hello world" sequences, yielding integer numbers.

const {DataStream} = require("scramjet"); module.exports = async function(stream, start = 0, end = 1000) { await DataStream.from(async function*() { let i = +start || 0; while(i++ < end) { await new Promise(res => setTimeout(res, 1000)); yield {x: i}; } }) .do(console.log) .run() }

You can find intro readme and more samples in our dedicates repository scramjetorg/scramjet-cloud-docs

API & CLI

Let's look at Transform Hub API via commands available in our CLI:

  • pack [options] - package directory with sequence code into tar.gz file
  • host [command] - monitor and check version of the host
  • config|c [command] - display and manage config
  • sequence|seq [command] - pack, deploy, manage and monitor sequences (app templates)
  • instance|inst [command] - manage and monitor instances (running apps)

The above commands (and related API) covers complete management of data processing engine and serverless apps running on top of it.

Once started, each running instance exposes following API endpoints:

  • input, output
  • stdin, stdout, stderr
  • log, monitoring
  • _event (to instance), event (from instance)
  • stop, kill

This approach follows "batteries included" approach, and each running instance is handled in the same way.

Why Scramjet Cloud Platform

Our approach shown above has several benefits:

  • Freedom and flexibility – no artificial limits on data size and execution time of apps; no "execution time limit" or "payload size limit".
  • Great price for value – effective data workflows with fully programmable data acquisition, ability to create patterns between instances performing various data processing tasks.
  • Performance by design – instantaneous execution of data without proxies, queues and gateways. Light apps with minimal resources consumption.
  • Works cross-native (Edge & Cloud) – out of the box spanning between locations. Run the same type of apps on edge or smart devices via standalone Scramjet Transform Hub and in our Scramjet Cloud Platform.

As a summary, below you will find diagram showing various patters of chaining data processing on our platform:

Scramjet data flow pattern

Author:

Łukasz Kamieniecki-Mruk

Łukasz Kamieniecki-Mruk

IT Professional with over 15 years of experience in IT systems development and delivery.