Server-side Rendering: The Server

Backstory was boring. How about some code this time?

In the previous part I described how Nebula arrived at the decision to implement server-side rendering (SSR) for the web app. Then came the hard part: actually doing it

Goal

Route requests to our server, render the React app as static HTML, and serve that

To achieve this goal, the server needs to do just three things:

Determine what page is requested
Query HTTP APIs for data to render
Render HTML

Because the web app is already built on React, react-router, and react-query, all three were taken care of:

react-router picks a page component depending on location.href
react-query loads data
ReactDOMServer.renderToString renders everything to HTML

Wrap everything into an express server, bundle that into a Docker container — and we’re done, easy-peasy

NOPE

Browser APIs

There’s no location.href in Node, just like there’s no window, document, nor navigator. In the case of react-router, it’s not much of a problem, because Node’s request.url does basically the same job. But what about things like window.addEventListener?

Usually, a React app would access these APIs only in hooks¹, which partially solves the problem — one of the basic hooks, useEffect, isn’t called when a page is rendered to a string (e.g. “on a server”). So it's safe to call Browser APIs there

There are still other common patterns of accessing Browser APIs, like initial values for the useState hook or defining some global constant² that won’t change after page load. For these, one has to either mock browser APIs with tools like jsdom or check for globals to be defined. The former is rather heavy to do for SSR, so we opted to using window?./typeof window === 'undefined', wrapped into a self-explanatory isSSR function

Checks

The Nebula web app is written in TypeScript (TS) and it would be nice to express “window might be undefined, but it’s always defined in useEffect body” with its type system. As far as I know, it’s not possible at the moment, so we can’t rely on TS to catch all uses of the Browser API without taking a hit in development experience (“why do I need to check for window in effects?! boo, typescript is bad!”)

So we’re checking it with good ol’ smoke tests. Loop over a list of the web app’s URLs, request HTML, check if the response status code is either 200 or 404, and throw a stack trace if the status code is 500. Simple. Two things that complicate things:

Some URLs depend on the environment, e.g. a video can be in production but not in staging, and vice versa. Because of these pages, “list of web app’s URLs“ is actually “list of async functions that return URLs” — if a page has consistent URL across environments, it would just return a string, otherwise it can access API to get some video and return its permalink
SSR has to return correct HTTP codes. More on that later

Queries and Cache

Okay, next; loading data

react-query is a wonderful collection of hooks that greatly simplify working with HTTP APIs³. But, since useEffect isn’t called on a server, SSR has to do querying itself. Thankfully, react-query provides methods to do just that

When using react-query, the app has a QueryClientProvider somewhere in the React virtual DOM with a QueryClient object. This object keeps track of all the API queries created by the child components and holds their state and data. To run those queries in the absence of useEffect, the server has to:

Render the page without API data
Fetch queries
Render the page with API data

The first step is straightforward

Running queries is a bit more tricky because they can be disabled (or already loaded/failed, more on that later), but a pair of filters does the job — one for enabled, another for idle status (= “needs to be fetched”)⁴

function getIdleQueries(queryClient) {
  const queryCache = queryClient.getQueryCache();
  const queries = queryCache.findAll();

  return (
    queries
      .filter((q) => q.options.enabled !== false)
      .filter((q) => q.state.status === 'idle')
  );
}

After we’ve got the queries to run, we create a Promise for each of them and wait for them to settled (i.e. be either resolved or rejected):

async function extractAndFetchQueries(queryClient) {
  const queries = getIdleQueries(queryClient);
  const fetchPromises = queries.map(async (q) => {
    const { onSuccess } = q.options || {};
    const queryResponse = await queryClient.fetchQuery(q.queryKey, q.options);
    onSuccess?.(queryResponse);
  });

  const promiseResults = await Promise.allSettled(fetchPromises);
  // ...
}

Now queryClient has data for initial render, but some promises might have been rejected. That’s why we’ve saved them in values. If there’s a failed query, we need to get the error for future use on the CDN and clean up them from cache:

async function extractAndFetchQueries(queryClient) {
  // ...
  const queryError = promiseResults
    .find(({ status }) => status === 'rejected')?.reason;
  if (queryError) {
    queryClient.removeQueries({
      predicate(query) { return query.state.status === 'error' },
    });
    throw queryError;
  }
}

After all that, the server does what it needs to do, resulting in this humble virtual DOM root:

const renderedHtml = ReactDOMServer.renderToString(
  <QueryClientProvider client={queryClient}>
    <StaticRouter location={url}>
      <App />
    </StaticRouter>
  </QueryClientProvider>
);

Are we done?

In theory, now we have our web app rendered with API data — we just insert it into <div id="root"></div> in a barebones index.html

In practice, Nebula does steps “2. Fetch queries” and “3. Render page with API data” two more times to make sure there are no more idle queries. For example, when loading /jetlag?tab=playlists, we first ask API for the channel and, if it was found, for playlists associated with it. But even if the server doesn’t do enough repeats to load everything, it’s okay — the static HTML will include placeholders or default values for a browser to overwrite

Additionally, queryClient’s cache is kept for a minute in the server’s memory to skip API requests for frequently needed data, like the list of video/channel categories

Plus, the server response is not just HTML, but also…

HTTP codes

What if there’s an error from the API? The one that we found in promiseResults and throw queryError?

In that case the server should respond with a non-200 HTTP code and render whatever HTML is ready. Doing this is important for the CDN layer, web crawlers (to let them know that crawled URL isn’t available), and for our smoke tests mentioned before

For the Nebula web app, if the queryError is an AxiosError⁵ with 4XX status, we just pretend that the page is not found:

try {
  await extractAndFetchQueries(queryClient);
} catch (e) {
  if (
    axios.isAxiosError(e) &&
    e.response &&
    e.response.status >= 400 &&
    e.response.status < 500
  ) {
    res.writeHead(404, { 'Content-Type': 'text/html' });
    res.end(html);
    return;
  }

  throw e;
}

If the API responded with 5XX or if the SSR server failed to render the page, we just return a 500 status code with Internal Server Error as the body (and a stack trace when not in the production environment). But neither human nor bot visitors will see these three words because we’ll handle that on the CDN

Other common HTTP codes are 3XX for redirects. For those, we need to pass a routerContext object to the <StaticRouter> component and check if routerContext.action === 'REPLACE' after the virtual DOM is rendered. If so, then react-router would set routerContext.url to the redirect destination

User sessions

For several hundred words I’ve avoided mentioning an elephant in the room. Even on a streaming service without The Algorithm, there are personalized pages: Watch History, Watch Later, settings. Surely, SSR should deal with user sessions and authentication, right?

Not really

Since the app uses react-router, page navigation after initial load is done client-side and SSR wouldn’t be used. Search engine and social network crawlers won’t be authenticated

So why bother when rendering session-specific <body> on a server would be beneficial only when a human arrives at the site⁶, while adding:

costs — personal responses wouldn’t be cached as often as anonymous ones, so SSR would require more compute resources
lag — responses aren’t as cacheable, so there’s an almost a zero chance that local CDN would be used
risks — having responses for multiple users in the same runtime attracts awful bugs. Just ask Steam, which had an issue with showing user X data cached for user Y
and, obviously, complexity and more code

Updated goal

Waaaaait… If we don’t bother with the individual part of the page, we can also cut the rest of the <body>!

SSR in Nebula started as a project for search engines and link previews. Both can read the <head> tag for all the necessary metadata (and, in the case of Google, can execute JS to compute <body> for fuller descriptions and following links)

At the same time, cutting the <body> tag from the SSR response works around the need to hydrate HTML and CSS after browser executes JS — if there’s nothing inside <div id="root"></div>, nothing would flash or jump around because server thought that visitor has a bigger/smaller screen than they actually do. Also, video streaming without JS is possible, but very limiting (both for visitors and developers), so “strict <noscript>” folks are kinda on the outside?..⁷

So we can safely update SSR’s goal from

Route requests to our server, render the React app as static HTML, and serve that

Route requests to our server, render the <head> tag of the React app to static HTML, and serve that

We still need to render the virtual DOM and query APIs for the <head> tag because it’s rendered with a <Helmet> component. But we can throw away renderedHtml value and do this after we’ve done ReactDOMServer.renderToString:

const helmet = Helmet.renderStatic();

const renderedHtml = barebonesIndexHtml
  .toString()
  .replace(/<html/, '<html ' + helmet.htmlAttributes.toString())
  .replace(/<title>[^<]*<\/title>/, helmet.title.toString())
  .replace(/<\/head>\s*<body>/, [helmet.link, helmet.meta].join('') + '</head><body>');

This HTML will render <body> with an empty <div id="root"></div>, but page-specific <title> and Open Graph <meta> tags. Perfect for bots, basically-the-same-as-before for humans

Deployment

Now that we have (almost) complete server, we need to deploy it. This topic is outside of the scope of these blog posts. I mean, it’s either “create a Dockerfile” or “do whatever your custom workflow requires you to do”. Docker is boring and written to death, and I have no idea about your custom workflows 🤷

But, if your workflow includes serving static assets with hashed filenames (like index.2c3.js) from S3 or some other object storage, make sure that these both old and new assets are available during server deployment. You wouldn’t want to have a server respond with HTML that mentions index.2c3.js when it hasn’t been uploaded yet (or have been just removed from S3)

To avoid this problem, Nebula’s web deploy workflow keeps static assets on S3 for two calendar years (so, in 2022, there are assets from 2021 and 2022 in the S3 bucket)

Also, if your JS bundle depends on the build .env/ENV (for example, create-react-app’s REACT_APP_* environment variables), you’ll need to make it the same when compiling server and client, because while different order of environment variables won’t affect the runtime, it might affect the hash of compiled files, leading to a server expecting index.8ae.js file on S3 instead of just uploaded during the CI/CD pipeline index.2c3.js. Approach that worked for us was:

Generate .env during CI/CD setup step with the alphabetically ordered keys
Don't use job-specific environment variables

Are we done now?

Kinda?.. The server works, bots get the metadata, developers can forget about the parallel repo and focus on the main one when implementing new pages

But have you noticed “(almost)” and multiple mentions of CDN in previous sections? There are more SSR-related things outside of a Node server, and I will talk about CDNs in another post

Thank you for your attention

Thanks to Sam Rose for help with writing this. Photo by Taylor Vick

Hooks are a way to update parts of React app’s layout on user or external signal, like “HTTP request is done“. Without them (or old-style “class components” with predefined methods), layout would be either static or updated fully, after receiving and handling a signal at the root element ↩︎
IMO, this even without SSR is a code smell, after dealing with a lot of bugs because the “it won’t ever change” constant did change — user connected new input device, changed some browser/system setting, etc. ↩︎
Speaking as the one who’s rewritten API layer from extremely bolierplate-y Redux ↩︎
Even though the Nebula web app is written in TypeScript, code snippets here will be in JavaScript for brevity ↩︎
Axios being our preferred HTTP client, which, before fetch became a part of Node, was crucial for sharing query loader function across browsers and the server ↩︎
Also, when they reload pages and open internal link in new tabs ↩︎
Although, some (without any interactivity or video playback) <noscript>-friendly layout might be useful. For example, to preview a video page for those who do enable JS on site-by-site basis?.. 🤔 ↩︎