Loris Sigrist looking very handsome Loris Sigrist

The Better Way to load data

When we are loading lists in out web-apps, we usually do the following. Our app makes a fetch request to a server, waits for all the data to arrive, maybe the app validates it, and then displays the items.

import { TodosSchema, type Todo } from "./model"
import { display } from "my-framework"

const response = await fetch("/todos");    //fetch
const data = await response.json();        //wait
const todos = TodosSchema.parse(data).     //validate

for(const todo of todos) {
	display(todo);
}

But what if there are hundreds of items and the connection to the server is slow. If we wait for the entire response to arrive the user is going to see absolutely nothing for several seconds, and then see all the items at once. This feels sluggish. Unfortunately, for anyone taking the subway, this is a daily experience.

A list of items loading for a long time, and then being filled all at once

There might be a better way though. We don’t actually need to wait for all the data before we start displaying it. Once the data for the first item has made it over the network, we should be able display it. Having the items trickle in as the data arrives over the network would be a much nicer user experience.

In this post we are going to implement this using streams.

Quick note: In the Javascript world there are two different Stream APIs: Node Streams, and Web streams. Node Streams only work in Node, whereas Web Streams work both in Browsers and Node. Also, web-streams are sometimes called WHATWG-streams, after the standards organisation, It’s a mess.

We will be using web-streams.

Fortunately this won’t be that hard.

Our trusty fetch API is designed to make streaming easy. response.body is actually a stream that will give you access to the raw data coming in over the network, as it is coming in.

Let’s visualise that by logging each chunk of data as it arrives. We can access a stream’s data by getting it’s reader and waiting for a value to arrive. Once a value arrives we log it and again wait for the next value, and then the next value, etc, until the stream is done.

const response = await fetch('/list');
const stream = response.body;

const reader = stream.getReader(); //Get reader (boilerplate)
while (true) {
	const { value, done } = await reader.read(); //wait for value
	if (done) break;
	console.log(value);
}

We now see see a bunch of Uint8Arrays in the console. This is the raw binary data arriving over the network.

A bunch of Uint8Arrays being logged to the console

But we want text, so let’s convert the raw data to text. We can modify a stream’s data using a TransformStream. A TransformStream takes in a stream, runs some logic on each chunk of data as it arrives, and writes the result to an outgoing stream. In our case, we want a TransformStream that takes in a stream of raw binary data and outputs a stream of strings. This is such a common task that there actually is a built in one, the TextDecoderStream. Let’s use that. Don’t worry, we will be creating our own TransformStreams later on.

Let’s hook the TextDecoder up to our stream using the pipeThrough method. This will return a new stream with the transform applied.

const stream = request.body.pipeThrough(new TextDecoderStream());

We now have a bunch of readable strings in the console.

A bunch of strings being logged to the console, with each being a chunk of a big JSON string

But we really want a stream of objects that represent our items. We can’t just JSON.parse each string-chunk, they don’t line up with the JSON structure; What we need is a streaming JSON parser.

Writing our own would be hard and undifferentiated work, so instead we’re going to use a library. There is a fantastic one called @streamparser/json-whatwg which can create a TransformStream that takes in json-data and returns parsed objects.

npm install @streamparser/json-whatwg

We can initialise the TransformStream using the JSONParser constructor. We want each object in our todo-array to be emitted one after the other as they trickle in so let’s configure the parser for that. We can provide a pattern of which paths should be emitted as the paths option; like a regex that runs on the paths. We want each child of the top-level array to be emitted. This can be expressed using the $.* pattern. The dollar-sign is always the top-level object, the array in our case, and the star is a wildcard that matches each direct child.

Let’s add this parser to our stream-chain. This parser can also do the text-decoding internally so we don’t need the TextDecoderStream anymore.

import { JSONParser } from "@streamparser/json-whatwg"
...
const parser = new JSONParser({ paths: ["$.*"] })
const stream = response.body
	.pipeThrough(parser)

Optional Performance optimization: Add keepStack: false, stringBufferSize: undefined along with the paths option.

In the console we now see a bunch of weird objects. The value property in each one contains our list items in their fully parsed glory. JSONParser emits what it calls “ParsedElementInfo” objects, which contain the parsed values as well as some extra metadata. That’s what we’re seeing.

A bunch of objects with the properties value,key,parent and stack being logged to the console

Since we only care about the parsed values, let’s map over each element in the stream using, you guessed it, another Transform Stream. This time we’ll create our own. The constructor takes an object with some lifecycle methods. The start method runs when the stream starts, the flush method runs if the stream is about to be closed, and the transform method runs whenever a new chunk of data arrives. We will only be using transform . It takes two arguments the first one is the chunk of incoming data, in our case that is the ParsedElementInfo object from the JSONParser, and the second one is a stream-controller for the output stream. The stream-controller is how we write to or close the output stream. Here we enqueue the value property of each parsed element.

const mapToValueStream = new TransformStream({
	transform(parsedElementInfo, controller) {
		controller.enqueue(parsedElementInfo.value);
	}
});

Let’s tack on our TransformStream and look at the console.

const stream = response.body
	.pipeThrough(parser)
	.pipeThrough(valueStream);
Each object in the list being logged out one after the other

That’s looking good already! We get list-items trickling in as the data is arriving over the network!

Let’s replace the log-statement with our rendering logic. I want to keep this post framework agnostic, so I won’t spend much time here. This is where you would hook into your UI framework.

import { display } from "my-framework"
...
while(true) {
	const { value, done } = await reader.read();
	if(done) break;
	display(value)
}
The list being rendered one item at a time

Just what we wanted!

The original code we had did one more thing that we are not yet doing. It validated the data. Let’s add that. We’re going to need another TransformStream. This one is very similar to the one we already made. We need to validate each element in the stream, and write it to the output if and only if it’s valid. You could throw an error if an item is invalid; I’m just going to fail silently.

const validateItemStream = new TransformStream<unknown, Item>({
	transform(value, controller) {
		try {
			controller.enqueue(ItemSchema.parse(value));
		} catch (e) {}
	}
});

Let’s add it to the stream-chain. Still Works!

const stream = response.body
	.pipeThrough(parser)
	.pipeThrough(valueStream)
	.pipeThrough(validateTodoStream);

We’ve now implemented all the original functionality in a streaming manner, but there is an opportunity to refactor here. Our two TransformStreams are very similar. They each execute a mapping function over every element, and emit the result. Let’s DRY that up. We’re going to make a helper called MapStream that takes the mapping-function as an argument and returns a TransformStream that runs it for each chunk. If it throws, we ignore the element.

// helpers.ts
export function MapStream<I, O>(map: (i: I) => O): TransformStream<I, O> {
	return new TransformStream({
		transform(chunk, controller) {
			try {
				controller.enqueue(map(chunk));
			} catch (e) {}
		}
	});
}

We can now rewrite both our TransformStreams using the helper.

import { MapStream } from "./helpers"
...
const stream = response.body
	.pipeThrough(parser)
	.pipeThrough(MapStream(result => result.value))
	.pipeThrough(MapStream(TodoSchema.parse))

Very expressive, isn’t it?

With that, our implementation is done. But there is one more thing I would like to refactor; this while loop at the bottom. According to the spec, you’re supposed to be able to consume streams using a for await of loop, but not everyone implements this.

for await (const todo of stream) {
	display(todo);
}
Table showing that the 'for await of' syntax is only supported in Node, Deno and Firefox

Let’s write another helper that let’s us use the nicer syntax. If you’ve never used async-generators before, this will look unintelligible. That’s ok, this is entirely optional; Just stick with the while loop.

// helpers.ts
export async function* asIterable<T>(stream: ReadableStream<T>): AsyncGenerator<T> {
	const reader = stream.getReader();
	while (true) {
		const { value, done } = await reader.read();
		if (done) break;
		yield value;
	}
}

We can now use for await (const todo of asIterable(stream)) to asynchronously loop over the elements in the stream. I find this easier to read, since there is no control-flow.

import { MapStream, asIterable } from "./helpers"
...
for await (const todo of asIterable(stream)) {
	display(todo);
}

The final code looks like this:

import { JSONParser } from '@streamparser/json-whatwg';
import { MapStream, asIterable } from './helpers';
import { TodoSchema } from './TodoSchema';

const response = await fetch('/todos.json');

const parser = new JSONParser({
	paths: ['$.*']
});

const stream = response.body
	.pipeThrough(parser)
	.pipeThrough(MapStream((result) => result.value))
	.pipeThrough(MapStream(TodoSchema.parse));

for await (const todo of asIterable(stream)) {
	display(todo);
}

A few observations to close out.

  1. On slow connections, the Streaming version is both faster to show stuff to the user and also finishes earlier, since the parsing and validation happen in parallel with the fetching. On fast connections, the performance difference is negligible.
  2. Once the MapStream and asIterable helpers are defined, the streaming version of the code isn’t meaningfully longer. The effort for both versions is about the same.
  3. The bundle size for the streaming versions is slightly larger than the non-streaming version since we need to ship the JSONParser (+20kB). This isn’t always worth it. On sites with long session times it likely is worth it, since the extra code is only sent once and every subsequent request can be sped up. In PWAs, where your code is already cached on the client, streaming is a no brainer.

There is a lot more you can do with streams, I really encourage you to play around with them. They’re a really powerful idea that applies to much more than just data-fetching. I hope you’ve learned something and have a good day.