608B71FC-006A-4934-A643-7D9BA9340450Blog

Scramjet v4.20

blog__author-img
Michał Czapracki
CEO at Scramjet, Data Streaming Expert.
23F2E8CD-3026-46A5-86CC-D13114F7176E425AE875-B1A1-4EA1-8529-075D08DA0BB1

6 January 2019

The new release includes one new method unorder and some new features for pull and use.

As always there was a couple dependencies updated, including papaparse used by scramjet for CSV parsing.

One of the new methods available on all streams is unorder. It allows transforms to be processed without taking output order into account (although they are still started as a single item). The reason for the new method came from a request from one of the users in scramjet issue #25.

After some discussion we decided to add a new method to support the case in which someone would like to process a number of chunks in parallel, but is not concerned about the order in which the items are exported. For example let's consider a parallel download scenario:


_10
StringStream.from(fs.createReadStream("downloads.txt"))
_10
.lines()
_10
.parse(x => x.split("|"))
_10
.setOptions({maxParallel: 3})
_10
.unorder([url, filename] => {
_10
const out = request(url).pipe(fs.createWriteStream(filename));
_10
return new Promise(out);
_10
})
_10
.run();

When the above code is executed scramjet will keep 3 parallel operations, but will allow a long running operation not to lock the stream. If we used map instead of unorder chunk 4 would wait for ending of operation 1, but with unorder if only one of the previous operations concludes, another can be started using a limit.

For more info see the repo with an example here: github:devinivy/scramjet-experiments.

Another change in the newer version of scramjet is allowing the use of ES6 generators in pull and use. Now it's possible to write a code like this:


_12
// here we get a stream of files
_12
DataStream.from(gulp.src("path")).into(async (out, x) => {
_12
// let's use a scramjet module that read a file to generator
_12
return out.pull(async function* () {
_12
// first we can push some synchronous items
_12
yield `filename=${x}`;
_12
// then use async generators (node v10+)
_12
const gen = await readFileToGenerator(x);
_12
yield* gen();
_12
yield `--end--`;
_12
});
_12
}, new DataStream());

It's a cool feature that we found very useful in a case for one of our customers where a workaround had to be used. Now it's fully implemented and supported in both use and pull.

Project co-financed by the European Union from the European Regional Development Fund under the Knowledge Education Development Program. The project is carried out as a part of the competition of the National for Research and Development: Szybka Ścieżka.