Extract inline CSS into <link> tags for LLM crawlability#3359
Extract inline CSS into <link> tags for LLM crawlability#3359kennethkalmer merged 1 commit intomainfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughUpdated SSR configuration in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
gatsby-ssr.tsx (1)
52-63: Removeanyfrom the type guard and modelhrefas optional.At Lines 60-63, the lint suppression and
any-based guard reduce TS safety; also at Line 56,hrefis typed as required even though style nodes (e.g., Sandpack) may not have it.Proposed refactor
type StyleComponent = { type: 'style'; props?: { 'data-href'?: string; - href: string; + href?: string; }; }; -/* eslint-disable `@typescript-eslint/no-explicit-any` */ -const isStyleComponent = (node: any): node is StyleComponent => - typeof node === 'object' && node != null && (node as any).type === 'style'; -/* eslint-enable `@typescript-eslint/no-explicit-any` */ +const isStyleComponent = (node: React.ReactNode): node is StyleComponent => + typeof node === 'object' && + node != null && + 'type' in node && + (node as { type?: unknown }).type === 'style';As per coding guidelines,
**/*.{ts,tsx}: Avoid using 'any' type in TypeScript; use specific types instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gatsby-ssr.tsx` around lines 52 - 63, Change StyleComponent.props.href to be optional and replace the `any` in the type guard with a safer type (use unknown or a DOM/ReactNode union) so TypeScript can narrow correctly; update isStyleComponent signature from (node: any) to (node: unknown): node is StyleComponent, remove the eslint-disable comments, and adjust the runtime checks inside isStyleComponent to first ensure node is an object/non-null and then check (node as Partial<StyleComponent>).type === 'style' and optional props/href presence accordingly so no use of `any` remains and href is optional on StyleComponent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gatsby-ssr.tsx`:
- Around line 83-92: The comparator passed to Array.sort currently takes a
single parameter and ignores the second, breaking the comparator contract;
change the sort callback to compare two nodes (e.g., .sort((a, b) =>
Number(isGlobalStyleNode(b)) - Number(isGlobalStyleNode(a)))) and extract the
style-detection logic into a helper isGlobalStyleNode that uses the existing
isStyleComponent and the regex on data-href/href; also update the type guard
signature from (node: any) to (node: unknown) in isStyleComponent (or related
type-guard function) to remove the any usage and improve type safety.
- Line 98: The returned link element in gatsby-ssr.tsx currently includes
media="screen", which restricts the stylesheet to screen media; remove the
media="screen" attribute from the JSX return in the function that maps/returns
style links (the line returning <link key={styleHref} href={styleHref}
rel="stylesheet" type="text/css" ... />) so the linked stylesheet inherits the
default (all media) behavior and preserves original styling scope.
---
Nitpick comments:
In `@gatsby-ssr.tsx`:
- Around line 52-63: Change StyleComponent.props.href to be optional and replace
the `any` in the type guard with a safer type (use unknown or a DOM/ReactNode
union) so TypeScript can narrow correctly; update isStyleComponent signature
from (node: any) to (node: unknown): node is StyleComponent, remove the
eslint-disable comments, and adjust the runtime checks inside isStyleComponent
to first ensure node is an object/non-null and then check (node as
Partial<StyleComponent>).type === 'style' and optional props/href presence
accordingly so no use of `any` remains and href is optional on StyleComponent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Pull request overview
Adds a Gatsby SSR hook to reduce inline CSS in the rendered <head> by converting Gatsby-emitted <style data-href="…"> tags into <link rel="stylesheet" href="…"> in production, improving crawlability for LLM/AI crawlers.
Changes:
- Type the existing
onRenderBodyhook using Gatsby’sGatsbySSRAPI types. - Add
onPreRenderHTMLto replace Gatsby inline stylesheet<style>nodes with external<link rel="stylesheet">tags (prod-only). - Add helper type guards and ordering logic intended to keep global styles first.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ec9249e to
1d7af61
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return false; | ||
| } | ||
| const styleHref = node.props?.['data-href'] ?? node.props?.href ?? ''; | ||
| return /^\/styles\.[a-zA-Z0-9]+\.css$/.test(styleHref); |
There was a problem hiding this comment.
isGlobalStyleNode only matches hrefs that start with /styles.<hash>.css. This repo config supports assetPrefix (e.g. Heroku PR deploys), which typically causes Gatsby to emit absolute asset URLs; in that case the global stylesheet won’t be detected and won’t be sorted first as intended. Consider normalizing styleHref to a pathname (e.g. via new URL(styleHref, 'http://localhost').pathname) before applying the /styles.<hash>.css check, so it works for both absolute and relative URLs.
| return /^\/styles\.[a-zA-Z0-9]+\.css$/.test(styleHref); | |
| try { | |
| const stylePathname = new URL(styleHref, 'http://localhost').pathname; | |
| return /^\/styles\.[a-zA-Z0-9]+\.css$/.test(stylePathname); | |
| } catch { | |
| return false; | |
| } |
There was a problem hiding this comment.
Good catch — confirmed by inspecting the Heroku review app earlier in this PR (it served https://ably-docs-extract-css-f-gimcw9.herokuapp.com/styles.<hash>.css, which the previous regex did not match, so the global-first sort was silently a no-op there). Fixed in e692544 by normalizing through new URL(href, "http://localhost").pathname before matching, with a try/catch in case the value is not URL-parseable.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const transformedHeadComponents = headComponents.map((node) => { | ||
| const reorderedNode = isStyleComponent(node) ? sortedStyleComponents[sortedStyleIndex++] : node; | ||
|
|
||
| if (isStyleComponent(reorderedNode)) { | ||
| const styleHref = reorderedNode.props?.['data-href'] ?? reorderedNode.props?.href; |
There was a problem hiding this comment.
This reordering logic operates over all <style> nodes (filter(isStyleComponent)), including the Sandpack <style id="sandpack"> (which has no data-href). Because you then substitute style nodes from the sorted list, non-Gatsby inline styles can be swapped/reordered relative to Gatsby’s emitted CSS chunks, changing cascade and contradicting the “leave Sandpack untouched” goal.
Consider only sorting/replacing the subset of style nodes that have data-href/href (Gatsby-emitted stylesheet chunks), and leave other <style> nodes in-place (don’t include them in the sorted list / index).
There was a problem hiding this comment.
You are right — the previous version pulled Sandpack into the sorted list, and although the sort itself is stable, the substitution-into-original-positions step could swap Sandpack with a Gatsby data-href node if their relative order in the sorted list ended up different from the original.
Fixed in e692544 by introducing isExtractableStyleNode = isStyleComponent && getStyleHref !== undefined. Only nodes with a data-href/href are filtered, sorted, and used as substitution targets. Sandpack (and any other inline <style> without an href) is now passed through node unchanged in the .map, so its original position is guaranteed.
e692544 to
b24bdf8
Compare
Final review-app verificationCompared the freshly built review app against live Document-level metrics on
|
| Metric | Live (ably.com) |
Review app | Δ |
|---|---|---|---|
<style data-href> tags |
1 (large) | 0 | extracted |
| Inline CSS bytes (review app = Sandpack only) | 241,138 | 4,321 | −98.2% |
<head> size |
246,728 B | 9,022 B | −96.3% |
<body> byte offset |
246,734 | 9,028 | 27× sooner |
Gatsby <link rel="stylesheet"> |
0 | 1 | added |
Stylesheet media attribute |
n/a | none (default = all media) | print-safe |
<style id="sandpack"> position vs Gatsby link |
n/a | after (5443 > 963) | original order preserved |
A crawler now reaches the body within the first ~9 KB of the document instead of past 246 KB.
assetPrefix correctness
The review app emits its stylesheet as an absolute URL — https://ably-docs-extract-css-f-gimcw9.herokuapp.com/styles.6abe1dcd792cb717c21c.css — because Heroku PR deploys set assetPrefix. The URL-normalization step (new URL(href, "http://localhost").pathname) means the global-first sort still recognizes it as the global stylesheet on the review app, not just locally where the path is relative.
Sandpack untouched
isExtractableStyleNode excludes inline <style> tags without a data-href/href, so Sandpack is not pulled into the sorted list and is not used as a substitution target. Its position relative to the Gatsby <link> is the same as in the unmodified Gatsby output.
|
@sacOO7 I addressed Copilots feedback from the review you requested |
LLM/AI crawlers bail before reaching content when the document head contains a wall of inline CSS. Replace Gatsby's `<style data-href>` head components with `<link rel="stylesheet">` pointing at the same already-emitted CSS files. Production-only; dev keeps inline styles for HMR. Sandpack and any other inline `<style>` without an href are left in place. Mirrors the workaround in apps/voltaire (originally added there for Total Blocking Time, same mechanism applies here). Reference: gatsbyjs/gatsby#1526. The global stylesheet href is normalized through the URL constructor before matching, so the cascade-preserving sort works whether Gatsby emits relative paths or the absolute URLs produced when assetPrefix is set (Heroku review apps). On /docs/basics: head shrinks from ~247 KB to ~8 KB, body tag reachable 30x sooner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b24bdf8 to
54ad04a
Compare
|
Related to #3015 |
Description
Related: DX-537
LLM/AI crawlers consistently bail before reaching meaningful content when the document
<head>is dominated by inline CSS. Gatsby's default behaviour inlines every chunk of route CSS as<style data-href="…">…</style>— on/docs/basicsthat's ~241 KB before the<body>even starts.This PR adds an
onPreRenderHTMLhook ingatsby-ssr.tsxthat swaps each Gatsby-emitted<style data-href="…">for a<link rel="stylesheet" href="…">pointing at the same already-emitted CSS file inpublic/. Production-only — dev keeps inline styles for HMR. Sandpack's inline<style>(nodata-href) is left untouched.This mirrors the same workaround already running in
apps/voltaire, originally added there to lower Total Blocking Time. Same mechanism, different motivation here (crawlability rather than TBT). Reference: gatsbyjs/gatsby#1526.Impact (measured on
/docs/basics)<style data-href>tags<head>size<body>Visual rendering is unchanged — the same CSS still loads, just via a
<link>.Checklist
Test plan
yarn buildsucceedspublic/docs/basics/index.htmlcontains zero<style data-href>and a<link rel="stylesheet" href="/styles.<hash>.css"><style id="sandpack">block still presentyarn serve+ Chrome DevTools — page renders identically to liveably.com/docs/basicsyarn develop) — the hook is a no-op outside production🤖 Generated with Claude Code
Summary by CodeRabbit