Imagine you need a stream of data from any source available on the web, be it currency prices, temperature readouts, or any other stream of data - in our example, it will be a stream of timestamps from NYC.
Let's get started. I'll show you how to scrape timestamp data, but the mechanism works for any stream from a website. Feel free to let us know what you might be using this for in the comments!
We'll be using Scramjet CLI to pack and send the programs to the server, so go ahead and install it.
_10npm i -g @scramjet/cli
In order to run our sample, you'll need to install STH and start it. You can install Transform Hub on the same machine or on a remote one that you have network access to. When that's done, start it there.
_10# This will download and install STH_10npm i -g @scramjet/sth_10_10# This will start it_10sth_10_10# See sth --help for more info.
If you installed STH on a remote machine using the ip a
command to see what IP's the machine has.
Clone the repo locally, get into the samples directory and find scraping. We have to install our local dependencies as well.
Copy the following lines to your terminal:
_10# This downloads the repo to your computer_10git clone https://github.com/scramjetorg/scramjet-cloud-docs.git_10_10# This will enter the directory with the sample_10cd scramjet-cloud-docs/samples/crypto-prices_10_10# This will install the local dependencies_10npm install
Now let's open up an editor to get a better understanding of what's going on inside.
_10# If you're viewing the file from within VSCode use this:_10code index.ts_10_10# Otherwise use either vi, nano or just write editor to use the default one_10editor index.ts
We're exporting an async generator function with the default arguments. You can think of it as just a function that returns multiple values over time. It's typed as a ReadableApp, which doesn't use the input and uses the three other parameters - the currency strings and the interval.
_10const app: ReadableApp<string> = async function* (_10 _stream,_10 currency = "BTC",_10 baseCurrency = "USD",_10 interval = 3000_10) {
Note: Using the types is not required, but it makes it easier for you to write code that will work with STH.
As you can see inside, we run an infinite loop and fetch the data from the API based on the arguments. At the beginning of the loop we also set up a timer, it'll come in handy a bit later. Notice we're not awaiting it yet.
_10 while (true) {_10 const ref = defer(interval);_10 const data = await fetch(`https://api.coinbase.com/v2/prices/${currency}-${baseCurrency}/spot`);
Then we yield the result with a new line character appended for prettier output.
_10yield JSON.stringify(await data.json()) + "\r\n";
Finally, we wait for the previously set timer. Thanks to this, we'll be making requests every 3 seconds, not with a space of 3 seconds in between the requests.
Now when we have an idea of what's going on under the hood. Let's run our application.
We'll create a standalone package for our sequence. We can get started with building our source code or executing the typescript compiler (this step isn't necessary for plain javascript modules)
_10# this builds the code_10tsc -p tsconfig.json_10_10# this copies the package.json file to the output directory_10# note: the directory depends on your tsconfig.json_10cp -r package.json dist/_10_10# this will install dependencies in the dist folder_10(cd dist && npm i --only=production)
The sample has a handy npm script that does all of the above in one simple command:
_10npm run build
Now, let's use SI CLI to create a package which will create a tar.gz compressed package.
_10si pack -o crypto-sample.tar.gz dist/
And we can now send it as a sequence like this:
_10# You can provide the name_10si seq send crypto-sample.tar.gz_10_10# we can use the "-" sign to call the last one_10si seq send -
Now we have an ID of our newly created sequence. Nothing runs yet but let's change that! Let's start our sequence. To do so, we need to pass ID and our arguments: URL and currency symbols. We won't pass the interval so, it will default to 3000
milliseconds as in the code.
_10si seq start - BTC USD
Now we can connect to the sequence output and see how it shows up:
_10si seq output -
In the terminal you'll see the current exchange rate every 3 seconds for every single line.
Easy peasy! In your application, you will likely be consuming our REST API. Take a look at Scramjet Transform Hub's API documentation here.
The program will run forever unless you stop it first. Let's try to stop the service now:
_10si inst stop - 3000
The last started sequence will stop in 3 seconds (that's the 3000
argument there). You can then delete the sequence afterward:
_10si seq delete -
And there you go. In this article I've shown how to write and deploy a simple data acquisition program to Scramjet Transform Hub, how to access the data.
Here's a couple of links you may want to see also: