Blog
Harnessing the Power of Scramjet Sequences for Streamlined Data Processing
In the rapidly evolving landscape of cloud computing, the Scramjet Cloud Platform (SCP) emerges as a powerful tool for developers seeking to deploy, manage, and execute their code in the cloud efficiently. A central feature of this platform is the Scramjet Sequence - a potent mechanism that encapsulates a user's source code along with all its dependencies into a ready-to-deploy package. This article provides a comprehensive guide on creating, deploying, and managing Scramjet Sequences, focusing on a Python-based example for speech-to-text conversion using AssemblyAI.
Understanding Scramjet Sequences
A Scramjet Sequence is essentially a container for your algorithm, encompassing the code and all necessary dependencies. It is designed to be executed on the Scramjet Cloud Platform, facilitating seamless data processing. Through the Scramjet Framework, Sequences gain the ability to asynchronously process data streams, enabling them to produce, consume, and transform data efficiently.
Preparing Your Sequence
To embark on this journey, let's explore the audio2text-input Python sample available on Scramjet's platform samples GitHub repository. This example demonstrates the integration of AssemblyAI for speech recognition within a Scramjet Sequence. To run this Sequence, you'll need your AssemblyAI token, which must be included when executing the start command:
_10si sequence start <Sequence-id> --args=[\"<AssemblyAI-token>\"]
Retrieving the output, such as the transcribed text from an audio file, can be done using the Scramjet Interface (si) output command or the output API endpoint, further details of which can be found in the API reference.
_10si instance output <Instance-id>
Crafting Your Handler
The heart of a Scramjet Sequence lies in its ability to handle asynchronous data processing. This is achieved through defining a main function that takes context and input stream arguments, processing them to produce the desired output.
_41import requests_41import time_41import json_41from scramjet.streams import Stream_41_41async def run(context, input, token):_41 audio_file = await input.reduce(lambda a, b: a+b)_41 base_url = "https://api.assemblyai.com/v2"_41_41 headers = {_41 "authorization": token_41 }_41_41 response = requests.post(_41 base_url + "/upload",_41 headers=headers,_41 data=audio_file _41 )_41 upload_url = response.json()["upload_url"]_41 data = {_41 "audio_url": upload_url _41 }_41 url = base_url + "/transcript"_41 response = requests.post(url, json=data, headers=headers)_41_41 transcript_id = response.json()['id']_41 polling_endpoint = f"https://api.assemblyai.com/v2/transcript/{transcript_id}"_41_41 while True:_41 transcription_result = requests.get(polling_endpoint, headers=headers).json()_41_41 if transcription_result['status'] == 'completed':_41 break_41_41 elif transcription_result['status'] == 'error':_41 raise RuntimeError(f"Transcription failed: {transcription_result['error']}")_41_41 else:_41 time.sleep(3)_41_41 return Stream.read_from(f"{transcription_result['text']} \n")
This code reads the input stream (chunks of binary audio data in wave format) into a single audio file using the
reduce
function in an asynchronous manner.
The prepared audio file is then uploaded to AssemblyAI's /upload
endpoint using a POST request.
Once the transcription is completed, it reads the transcribed text from the transcription_result
and wraps it in a stream
Stream.read_from
and returns it to the Sequence output. This output can be consumed either through Instance
/output
API endpoint
or through Scramjet
SDK
or si
.
Packaging Your Sequence
Before deployment, it's essential to package your Sequence into a .tar.gz file, incorporating the main.py script, any additional files, and dependencies. This process involves creating a requirements.txt file for Python dependencies, a package.json file for defining the Sequence, and organizing all necessary files into a coherent structure ready for packaging.
requirements.txt
_10scramjet-framework-py_10requests_10pyee_10urllib3==1.26.6_10pyOpenSSL
package.json
_24{_24 "name": "audio2text-input",_24 "version": "1.0.0",_24 "main": "./main.py",_24 "author": "Ray_Nawfal",_24 "license": "GPL-3.0",_24 "description": "Transcription of an audio file using AssemblyAI API.",_24 "keywords": [_24 "AudioToText",_24 "Transcription",_24 "AssemblyAI"_24 ],_24 "repository": {_24 "type": "git",_24 "url": "https://github.com/scramjetorg/platform-samples/tree/main/python/audio2text-input"_24 },_24 "engines": {_24 "python3": "3.8.0"_24 },_24 "scripts": {_24 "build": "mkdir -p dist/__pypackages__/ && pip3 install -t dist/__pypackages__/ -r requirements.txt && cp -t ./dist/ *.py *.json", _24 "clean": "rm -rf ./dist"_24 }_24}
1. create directory __pypackages__ in the same directory as main.py
_10mkdir __pypackages__
2. Installing dependencies in the __pypackages__ folder. If the user uses any packages that are written in C language, in order to run them on SCP a user needs to install the dependencies on a Linux Machine. Simply because Scramjet Cloud Platform runs on Linux OS.
_10pip3 install -t __pypackages__ -r requirements.txt
A Sequence can be packed manually in the form of a tar.gz file before being sent to Scramjet Cloud Platform through the command-line:
_10si sequence pack <path/to/Sequence/folder>
Deploying on Scramjet Cloud Platform
Deployment involves packing and sending your Sequence to the Scramjet Cloud Platform.
This can be done manually or through the si
sequence deploy command,
which facilitates the process by packing, sending and starting the Sequence. Detailed steps for packaging and deployment are available in the
CLI reference.
_10si sequence send <path/to/filename.tar.gz> --progress
Monitoring and Logs
Once deployed, monitoring your Sequence's performance and output is crucial. The Scramjet Cloud Platform provides various methods to access logs, including through the SCP Console Panel, the logging library within your Python script, or API endpoints for stdout, stderr, and output logs. These tools offer insights into the execution and performance of your Sequence, enabling you to debug and optimize as necessary.
Utilizing Events for Enhanced Interactivity
Scramjet Sequences are not just about processing data; they're also about interaction. Sequences can communicate through Topics, or triggering and respond to Events asynchronously. This capability allows for complex workflows and interactions between different Sequences, enhancing the platform's flexibility and extensibility.
Final thoughts
Scramjet Sequences offer a powerful, flexible, and efficient way to handle data processing in the cloud. By following the steps outlined in this guide, you can leverage the Scramjet Cloud Platform to deploy and manage your Python-based data processing tasks with ease. Whether you're converting audio to text, analyzing streams of data, or integrating various cloud services, Scramjet provides the tools and infrastructure to bring your projects to life in the cloud.
Register now for your free trial HERE.
Checkout Scramjet platform samples on GitHub.