Introduction to Scramjet Data Processing Platform

This article will introduce you to the key principles of our data processing platform. Our data apps engine has its own approach to creation, deployment and runtime of serverless applications; in many aspects different than solutions from domains such as software buses, integration suites and Function-as-a-Service offerings. We hope you will find it interesting.

"3 in 1" data processing platform

Our core runtime environment is called "Scramjet Transform Hub". It's available as a standalone software package and will be the core element of our Scramjet Cloud Platform.

We like to call our approach "3 in 1 data processing platform" as it combines 3 concepts into one solution:

data processing engine

serverless data applications

complete API with CLI (covering both I/O and management endpoints)

Let's dive into each point separately.

Data processing engine

Scramjet Transform Hub creates unified deployment, runtime, management and execution plain for serverless applications (sequences).

In short, STH allows you to start data processing in 3 simple steps:

1. Deploy


_10si sequence send <sequence-package-tar>

2. Run


_10si sequence run <sequence-id>

3. Send data


_10curl -H "Content-Type: application/octet-stream" --data-binary "@file.txt" <instance-input-endpoint>

You are free to post to our sequence simple HTTP request, file, send a stream or even read data from other stream or API.

Please notice that, contrary to typical microservices setup, there is no expensive step of building containerized image, pushing it to registry and then downloading it to container orchestrator to run the microservice. You can move from directory with code to sequence processing your data in less than a minute.

We do package our apps but their size is measured in kilobytes, not in hundreds of megabytes as in case of container images. Light app design gives better performance, optimized resources usage and simpler CI/CD process.

We do have a short, 3 minute demo, showing the whole package preparation, deployment and run process on our YouTube channel (see Scramjet Transform Hub - Intro demo)

Serverless data applications

We call the user applications sequences. They have capabilities to perform continues data and stream processing, they have no run time limits or input data size limits.

Each sequence has a straightforward structure - it's a directory with at least two core files:

package.json - simple JSON file describing sequence metadata
index.[js/ts] - JavaScript or TypeScript file with sequence code. You are free to structure your app in multiple files if you like.

Below there is content of one of our sample "hello world" sequences, yielding integer numbers.


_13const { DataStream } = require("scramjet");
_13
_13module.exports = async function (stream, start = 0, end = 1000) {
_13    await DataStream.from(async function* () {
_13        let i = +start || 0;
_13        while (i++ < end) {
_13            await new Promise(res => setTimeout(res, 1000));
_13            yield { x: i };
_13        }
_13    })
_13        .do(console.log)
_13        .run();
_13};

You can find intro readme and more samples in our dedicates repository scramjetorg/scramjet-cloud-docs

API & CLI

Let's look at Transform Hub API via commands available in our CLI:

pack [options] - package directory with sequence code into tar.gz file
host [command] - monitor and check version of the host
config|c [command] - display and manage config
sequence|seq [command] - pack, deploy, manage and monitor sequences (app templates)
instance|inst [command] - manage and monitor instances (running apps)

The above commands (and related API) covers complete management of data processing engine and serverless apps running on top of it.

Once started, each running instance exposes following API endpoints:

input, output
stdin, stdout, stderr
log, monitoring
_event (to instance), event (from instance)
stop, kill

This approach follows "batteries included" approach, and each running instance is handled in the same way.

Why Scramjet Cloud Platform

Our approach shown above has several benefits:

Freedom and flexibility – no artificial limits on data size and execution time of apps; no "execution time limit" or "payload size limit".
Great price for value – effective data workflows with fully programmable data acquisition, ability to create patterns between instances performing various data processing tasks.
Performance by design – instantaneous execution of data without proxies, queues and gateways. Light apps with minimal resources consumption.
Works cross-native (Edge & Cloud) – out of the box spanning between locations. Run the same type of apps on edge or smart devices via standalone Scramjet Transform Hub and in our Scramjet Cloud Platform.

As a summary, below you will find diagram showing various patters of chaining data processing on our platform:

Project co-financed by the European Union from the European Regional Development Fund under the Knowledge Education Development Program. The project is carried out as a part of the competition of the National for Research and Development: Szybka Ścieżka.