Server-side Rendering: The Server
Backstory was boring. How about some code this time?
In the previous part I described how Nebula arrived at the decision to implement server-side rendering (SSR) for the web app. Then came the hard part: actually doing it
Goal
Route requests to our server, render the React app as static HTML, and serve that
To achieve this goal, the server needs to do just three things:
- Determine what page is requested
- Query HTTP APIs for data to render
- Render HTML
Because the web app is already built on React, react-router
, and react-query
, all three were taken care of:
react-router
picks a page component depending onlocation.href
react-query
loads dataReactDOMServer.renderToString
renders everything to HTML
Wrap everything into an express
server, bundle that into a Docker container — and we’re done, easy-peasy
Browser APIs
There’s no location.href
in Node, just like there’s no window
, document
, nor navigator
. In the case of react-router
, it’s not much of a problem, because Node’s request.url
does basically the same job. But what about things like window.addEventListener
?
Usually, a React app would access these APIs only in hooks1, which partially solves the problem — one of the basic hooks, useEffect
, isn’t called when a page is rendered to a string (e.g. “on a server”). So it's safe to call Browser APIs there
There are still other common patterns of accessing Browser APIs, like initial values for the useState
hook or defining some global constant2 that won’t change after page load. For these, one has to either mock browser APIs with tools like jsdom
or check for globals to be defined. The former is rather heavy to do for SSR, so we opted to using window?.
/typeof window === 'undefined'
, wrapped into a self-explanatory isSSR
function
Checks
The Nebula web app is written in TypeScript (TS) and it would be nice to express “window
might be undefined, but it’s always defined in useEffect
body” with its type system. As far as I know, it’s not possible at the moment, so we can’t rely on TS to catch all uses of the Browser API without taking a hit in development experience (“why do I need to check for window
in effects?! boo, typescript is bad!”)
So we’re checking it with good ol’ smoke tests. Loop over a list of the web app’s URLs, request HTML, check if the response status code is either 200 or 404, and throw a stack trace if the status code is 500. Simple. Two things that complicate things:
- Some URLs depend on the environment, e.g. a video can be in production but not in staging, and vice versa. Because of these pages, “list of web app’s URLs“ is actually “list of
async
functions that return URLs” — if a page has consistent URL across environments, it would just return a string, otherwise it can access API to get some video and return its permalink - SSR has to return correct HTTP codes. More on that later
Queries and Cache
Okay, next; loading data
react-query
is a wonderful collection of hooks that greatly simplify working with HTTP APIs3. But, since useEffect
isn’t called on a server, SSR has to do querying itself. Thankfully, react-query
provides methods to do just that
When using react-query
, the app has a QueryClientProvider
somewhere in the React virtual DOM with a QueryClient
object. This object keeps track of all the API queries created by the child components and holds their state and data. To run those queries in the absence of useEffect
, the server has to:
- Render the page without API data
- Fetch queries
- Render the page with API data
The first step is straightforward
Running queries is a bit more tricky because they can be disabled (or already loaded/failed, more on that later), but a pair of filter
s does the job — one for enabled
, another for idle
status (= “needs to be fetched”)4
function getIdleQueries(queryClient) {
const queryCache = queryClient.getQueryCache();
const queries = queryCache.findAll();
return (
queries
.filter((q) => q.options.enabled !== false)
.filter((q) => q.state.status === 'idle')
);
}
After we’ve got the queries to run, we create a Promise
for each of them and wait for them to settled (i.e. be either resolved or rejected):
async function extractAndFetchQueries(queryClient) {
const queries = getIdleQueries(queryClient);
const fetchPromises = queries.map(async (q) => {
const { onSuccess } = q.options || {};
const queryResponse = await queryClient.fetchQuery(q.queryKey, q.options);
onSuccess?.(queryResponse);
});
const promiseResults = await Promise.allSettled(fetchPromises);
// ...
}
Now queryClient
has data for initial render, but some promises might have been rejected. That’s why we’ve saved them in values
. If there’s a failed query, we need to get the error for future use on the CDN and clean up them from cache:
async function extractAndFetchQueries(queryClient) {
// ...
const queryError = promiseResults
.find(({ status }) => status === 'rejected')?.reason;
if (queryError) {
queryClient.removeQueries({
predicate(query) { return query.state.status === 'error' },
});
throw queryError;
}
}
After all that, the server does what it needs to do, resulting in this humble virtual DOM root:
const renderedHtml = ReactDOMServer.renderToString(
<QueryClientProvider client={queryClient}>
<StaticRouter location={url}>
<App />
</StaticRouter>
</QueryClientProvider>
);
Are we done?
In theory, now we have our web app rendered with API data — we just insert it into <div id="root"></div>
in a barebones index.html
In practice, Nebula does steps “2. Fetch queries” and “3. Render page with API data” two more times to make sure there are no more idle
queries. For example, when loading /jetlag?tab=playlists
, we first ask API for the channel and, if it was found, for playlists associated with it. But even if the server doesn’t do enough repeats to load everything, it’s okay — the static HTML will include placeholders or default values for a browser to overwrite
Additionally, queryClient
’s cache is kept for a minute in the server’s memory to skip API requests for frequently needed data, like the list of video/channel categories
Plus, the server response is not just HTML, but also…
HTTP codes
What if there’s an error from the API? The one that we found in promiseResults
and throw queryError
?
In that case the server should respond with a non-200
HTTP code and render whatever HTML is ready. Doing this is important for the CDN layer, web crawlers (to let them know that crawled URL isn’t available), and for our smoke tests mentioned before
For the Nebula web app, if the queryError
is an AxiosError
5 with 4XX status, we just pretend that the page is not found:
try {
await extractAndFetchQueries(queryClient);
} catch (e) {
if (
axios.isAxiosError(e) &&
e.response &&
e.response.status >= 400 &&
e.response.status < 500
) {
res.writeHead(404, { 'Content-Type': 'text/html' });
res.end(html);
return;
}
throw e;
}
If the API responded with 5XX or if the SSR server failed to render the page, we just return a 500 status code with Internal Server Error
as the body (and a stack trace when not in the production environment). But neither human nor bot visitors will see these three words because we’ll handle that on the CDN
Other common HTTP codes are 3XX for redirects. For those, we need to pass a routerContext
object to the <StaticRouter>
component and check if routerContext.action === 'REPLACE'
after the virtual DOM is rendered. If so, then react-router
would set routerContext.url
to the redirect destination
User sessions
For several hundred words I’ve avoided mentioning an elephant in the room. Even on a streaming service without The Algorithm, there are personalized pages: Watch History, Watch Later, settings. Surely, SSR should deal with user sessions and authentication, right?
Not really
Since the app uses react-router
, page navigation after initial load is done client-side and SSR wouldn’t be used. Search engine and social network crawlers won’t be authenticated
So why bother when rendering session-specific <body>
on a server would be beneficial only when a human arrives at the site6, while adding:
- costs — personal responses wouldn’t be cached as often as anonymous ones, so SSR would require more compute resources
- lag — responses aren’t as cacheable, so there’s an almost a zero chance that local CDN would be used
- risks — having responses for multiple users in the same runtime attracts awful bugs. Just ask Steam, which had an issue with showing user X data cached for user Y
- and, obviously, complexity and more code
Updated goal
Waaaaait… If we don’t bother with the individual part of the page, we can also cut the rest of the <body>
!
SSR in Nebula started as a project for search engines and link previews. Both can read the <head>
tag for all the necessary metadata (and, in the case of Google, can execute JS to compute <body>
for fuller descriptions and following links)
At the same time, cutting the <body>
tag from the SSR response works around the need to hydrate HTML and CSS after browser executes JS — if there’s nothing inside <div id="root"></div>
, nothing would flash or jump around because server thought that visitor has a bigger/smaller screen than they actually do. Also, video streaming without JS is possible, but very limiting (both for visitors and developers), so “strict <noscript>
” folks are kinda on the outside?..7
So we can safely update SSR’s goal from
Route requests to our server, render the React app as static HTML, and serve that
to
Route requests to our server, render the
<head>
tag of the React app to static HTML, and serve that
We still need to render the virtual DOM and query APIs for the <head>
tag because it’s rendered with a <Helmet>
component. But we can throw away renderedHtml
value and do this after we’ve done ReactDOMServer.renderToString
:
const helmet = Helmet.renderStatic();
const renderedHtml = barebonesIndexHtml
.toString()
.replace(/<html/, '<html ' + helmet.htmlAttributes.toString())
.replace(/<title>[^<]*<\/title>/, helmet.title.toString())
.replace(/<\/head>\s*<body>/, [helmet.link, helmet.meta].join('') + '</head><body>');
This HTML will render <body>
with an empty <div id="root"></div>
, but page-specific <title>
and Open Graph <meta>
tags. Perfect for bots, basically-the-same-as-before for humans
Deployment
Now that we have (almost) complete server, we need to deploy it. This topic is outside of the scope of these blog posts. I mean, it’s either “create a Dockerfile
” or “do whatever your custom workflow requires you to do”. Docker is boring and written to death, and I have no idea about your custom workflows 🤷
But, if your workflow includes serving static assets with hashed filenames (like index.2c3.js
) from S3 or some other object storage, make sure that these both old and new assets are available during server deployment. You wouldn’t want to have a server respond with HTML that mentions index.2c3.js
when it hasn’t been uploaded yet (or have been just removed from S3)
To avoid this problem, Nebula’s web deploy workflow keeps static assets on S3 for two calendar years (so, in 2022, there are assets from 2021 and 2022 in the S3 bucket)
Also, if your JS bundle depends on the build .env
/ENV
(for example, create-react-app
’s REACT_APP_*
environment variables), you’ll need to make it the same when compiling server and client, because while different order of environment variables won’t affect the runtime, it might affect the hash of compiled files, leading to a server expecting index.8ae.js
file on S3 instead of just uploaded during the CI/CD pipeline index.2c3.js
. Approach that worked for us was:
- Generate
.env
during CI/CD setup step with the alphabetically ordered keys - Don't use job-specific environment variables
Are we done now?
Kinda?.. The server works, bots get the metadata, developers can forget about the parallel repo and focus on the main one when implementing new pages
But have you noticed “(almost)” and multiple mentions of CDN in previous sections? There are more SSR-related things outside of a Node server, and I will talk about CDNs in another post
Thank you for your attention
Thanks to Sam Rose for help with writing this. Photo by Taylor Vick
Hooks are a way to update parts of React app’s layout on user or external signal, like “HTTP request is done“. Without them (or old-style “class components” with predefined methods), layout would be either static or updated fully, after receiving and handling a signal at the root element ↩︎
IMO, this even without SSR is a code smell, after dealing with a lot of bugs because the “it won’t ever change” constant did change — user connected new input device, changed some browser/system setting, etc. ↩︎
Speaking as the one who’s rewritten API layer from extremely bolierplate-y Redux ↩︎
Even though the Nebula web app is written in TypeScript, code snippets here will be in JavaScript for brevity ↩︎
Axios being our preferred HTTP client, which, before
fetch
became a part of Node, was crucial for sharing query loader function across browsers and the server ↩︎Also, when they reload pages and open internal link in new tabs ↩︎
Although, some (without any interactivity or video playback)
<noscript>
-friendly layout might be useful. For example, to preview a video page for those who do enable JS on site-by-site basis?.. 🤔 ↩︎