Serve Markdown from Next.js Without Puppeteer (Lightweight Solution)
Learn how to serve Markdown from Next.js pages without Puppeteer's overhead. Use Accept headers for a lightweight, production-ready approach.
accept-md team
Puppeteer is often the go-to solution for converting web pages to Markdown, but it's heavy, slow, and unnecessary for Next.js applications. Since Next.js pages are server-rendered, you don't need a headless browser—you can serve Markdown directly using HTTP's Accept header.
This guide shows you how to serve Markdown from Next.js without Puppeteer, using a lightweight approach that's faster, more efficient, and easier to deploy.
Why Avoid Puppeteer for Next.js?
Puppeteer and similar headless browser tools have significant drawbacks:
Performance Issues
- Slow startup: Each request spawns a browser process (1-2 seconds overhead)
- Memory intensive: 50-100MB per request
- Limited concurrency: Can only handle 5-10 concurrent requests
- CPU heavy: Full browser rendering is expensive
Deployment Complexity
- Large dependencies: Requires Chromium binary (~170MB)
- System libraries: Needs additional OS packages
- Serverless limitations: Difficult on platforms like Vercel
- Cold start delays: Especially problematic in serverless functions
Unnecessary for Next.js
Since Next.js pages are already server-rendered, you don't need browser rendering:
- HTML is generated on the server
- No client-side JavaScript execution required
- Content is available immediately
- No need to wait for React hydration
The Accept Header Approach
HTTP's Accept header is the standard way to request different content types. Instead of using Puppeteer, you can:
- Intercept requests with
Accept: text/markdown - Fetch your page as HTML (it's already rendered)
- Convert HTML to Markdown
- Return clean Markdown
How It Works
# Regular user (gets HTML)
curl https://your-site.com/page
# Markdown request (gets Markdown)
curl -H "Accept: text/markdown" https://your-site.com/page
Your Next.js app serves the same URL, but returns different content based on the Accept header.
Implementation: Next.js Rewrites
Next.js rewrites can intercept requests based on headers and route them to a handler.
Step 1: Configure Rewrites
Add to your next.config.js:
// next.config.js
module.exports = {
async rewrites() {
return [
{
source: '/:path*',
has: [
{
type: 'header',
key: 'accept',
value: '(?<accept>.*text/markdown.*)',
},
],
destination: '/api/accept-md?path=:path*',
},
];
},
};
This configuration:
- Matches any path (
/:path*) - Checks for
Accept: text/markdownheader - Routes to
/api/accept-mdhandler with the original path
Step 2: Create Markdown Handler
Create the handler that converts HTML to Markdown:
// app/api/accept-md/route.js
import { NextResponse } from 'next/server';
import TurndownService from 'turndown';
const turndownService = new TurndownService({
headingStyle: 'atx',
codeBlockStyle: 'fenced',
});
export async function GET(request) {
const path = request.nextUrl.searchParams.get('path') || '/';
const baseUrl = request.nextUrl.origin;
// Fetch your own page as HTML
const htmlResponse = await fetch(`${baseUrl}${path}`, {
headers: {
// Remove Accept header to get HTML
accept: 'text/html',
},
});
if (!htmlResponse.ok) {
return NextResponse.json(
{ error: 'Page not found' },
{ status: 404 }
);
}
const html = await htmlResponse.text();
// Clean HTML (remove nav, footer, etc.)
const cleaned = html
.replace(/<nav[^>]*>[\s\S]*?<\/nav>/gi, '')
.replace(/<footer[^>]*>[\s\S]*?<\/footer>/gi, '')
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')
.replace(/<style[^>]*>[\s\S]*?<\/style>/gi, '');
// Convert to Markdown
const markdown = turndownService.turndown(cleaned);
return new NextResponse(markdown, {
headers: {
'Content-Type': 'text/markdown; charset=utf-8',
'Cache-Control': 'public, s-maxage=60, stale-while-revalidate',
},
});
}
Step 3: Install Dependencies
npm install turndown
# or
pnpm add turndown
That's it! No Puppeteer, no Chromium, no heavy dependencies.
Advanced: Metadata Extraction
For production use, you'll want to extract and preserve HTML metadata as YAML frontmatter.
Enhanced Handler with Metadata
// app/api/accept-md/route.js
import { NextResponse } from 'next/server';
import TurndownService from 'turndown';
import { parseHTML } from 'linkedom';
function extractMetadata(html) {
const { document } = parseHTML(html);
const metadata = {};
// Extract title
const title = document.querySelector('title');
if (title) metadata.title = title.textContent;
// Extract meta tags
document.querySelectorAll('meta').forEach((meta) => {
const name = meta.getAttribute('name') || meta.getAttribute('property');
const content = meta.getAttribute('content');
if (name && content) {
metadata[name.replace('og:', 'og_')] = content;
}
});
return metadata;
}
function formatFrontmatter(metadata) {
const yaml = Object.entries(metadata)
.map(([key, value]) => `${key}: "${value}"`)
.join('\n');
return `---\n${yaml}\n---\n\n`;
}
const turndownService = new TurndownService({
headingStyle: 'atx',
codeBlockStyle: 'fenced',
});
export async function GET(request) {
const path = request.nextUrl.searchParams.get('path') || '/';
const baseUrl = request.nextUrl.origin;
const htmlResponse = await fetch(`${baseUrl}${path}`, {
headers: { accept: 'text/html' },
});
if (!htmlResponse.ok) {
return NextResponse.json({ error: 'Not found' }, { status: 404 });
}
const html = await htmlResponse.text();
const metadata = extractMetadata(html);
// Clean and convert
const cleaned = html
.replace(/<nav[^>]*>[\s\S]*?<\/nav>/gi, '')
.replace(/<footer[^>]*>[\s\S]*?<\/footer>/gi, '')
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')
.replace(/<style[^>]*>[\s\S]*?<\/style>/gi, '');
const markdown = turndownService.turndown(cleaned);
// Prepend frontmatter
const frontmatter = Object.keys(metadata).length > 0
? formatFrontmatter(metadata)
: '';
return new NextResponse(frontmatter + markdown, {
headers: {
'Content-Type': 'text/markdown; charset=utf-8',
'Cache-Control': 'public, s-maxage=60, stale-while-revalidate',
},
});
}
Caching Strategy
To improve performance, implement intelligent caching:
// app/api/accept-md/route.js
const cache = new Map();
export async function GET(request) {
const path = request.nextUrl.searchParams.get('path') || '/';
// Check cache
const cacheKey = path;
if (cache.has(cacheKey)) {
return new NextResponse(cache.get(cacheKey), {
headers: {
'Content-Type': 'text/markdown; charset=utf-8',
'Cache-Control': 'public, s-maxage=3600',
},
});
}
// Generate markdown...
const markdown = await generateMarkdown(path);
// Cache result
cache.set(cacheKey, markdown);
return new NextResponse(markdown, {
headers: {
'Content-Type': 'text/markdown; charset=utf-8',
'Cache-Control': 'public, s-maxage=3600',
},
});
}
Cache Invalidation
Invalidate cache on build:
// Invalidate on BUILD_ID change
const buildId = process.env.BUILD_ID || process.env.NEXT_BUILD_ID;
if (buildId && lastBuildId !== buildId) {
cache.clear();
lastBuildId = buildId;
}
Performance Comparison
Here's how the Accept header approach compares to Puppeteer:
| Metric | Puppeteer | Accept Header |
|---|---|---|
| Response Time | 1.5-2.5s | 50-300ms |
| Memory per Request | 50-100MB | <5MB |
| Concurrent Requests | 5-10 | 50-100+ |
| Dependencies Size | ~170MB | <1MB |
| Cold Start | 3-5s | <100ms |
| Serverless Friendly | ❌ | ✅ |
Using accept-md (Production Solution)
For a production-ready solution that handles all the complexity, use accept-md:
Quick Setup
npx accept-md init
This automatically:
- Detects your Next.js router (App or Pages)
- Adds rewrites to
next.config.js - Creates the markdown handler with metadata extraction
- Sets up intelligent caching
- Configures clean selectors
Features
- Zero page changes: Your components stay HTML-only
- Metadata extraction: HTML meta tags → YAML frontmatter
- JSON-LD preservation: Structured data maintained
- Intelligent caching: Respects Next.js build cycles
- Works with all rendering: SSG, SSR, ISR supported
Configuration
// accept-md.config.js
module.exports = {
include: ['/**'],
exclude: ['/api/**', '/_next/**'],
cleanSelectors: ['nav', 'footer', '.no-markdown'],
cache: true,
};
Usage
curl -H "Accept: text/markdown" https://your-site.com/page
Best Practices
1. Clean HTML Selectors
Remove navigation and UI chrome:
cleanSelectors: [
'nav',
'footer',
'.sidebar',
'.ads',
'[data-no-markdown]',
],
2. Handle Images
Ensure image URLs are absolute:
transformers: [
(md) => md.replace(/\]\((\/[^)]+)\)/g, (match, path) => {
return `](https://your-site.com${path})`;
}),
],
3. Preserve Structure
Maintain heading hierarchy:
const turndownService = new TurndownService({
headingStyle: 'atx', // Use # for headings
});
4. Exclude Routes
Don't convert API routes or internal paths:
exclude: ['/api/**', '/_next/**', '/admin/**'],
5. Set Cache Headers
Use appropriate cache control:
'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate',
Common Issues and Solutions
Issue: Relative URLs Break
Solution: Use transformers to make URLs absolute:
transformers: [
(md) => md.replace(/\]\((\/[^)]+)\)/g, `](https://your-site.com$1)`),
],
Issue: Code Blocks Not Fenced
Solution: Configure Turndown:
const turndownService = new TurndownService({
codeBlockStyle: 'fenced',
});
Issue: Tables Not Converting
Solution: Turndown handles tables by default, but complex nested structures may need custom rules.
Issue: Metadata Not Preserved
Solution: Extract metadata before conversion and add as frontmatter:
const metadata = extractMetadata(html);
const frontmatter = formatYaml(metadata);
const markdown = frontmatter + turndownService.turndown(cleaned);
Testing
Test your implementation:
# Request Markdown
curl -H "Accept: text/markdown" http://localhost:3000/
# Should return Markdown, not HTML
Verify:
- ✅ Returns
text/markdowncontent type - ✅ Contains clean Markdown (no HTML tags)
- ✅ Includes metadata in frontmatter
- ✅ Navigation and footer removed
- ✅ Images have absolute URLs
Conclusion
Serving Markdown from Next.js without Puppeteer is not only possible but recommended. The Accept header approach provides:
- Better performance: 10-50x faster than Puppeteer
- Lower resource usage: <5MB vs 50-100MB per request
- Easier deployment: No Chromium dependencies
- Serverless friendly: Works on Vercel, Netlify, etc.
- Standards compliant: Uses HTTP Accept header
Since Next.js pages are server-rendered, you don't need browser rendering. Fetch the HTML, convert to Markdown, and serve it—simple, fast, and efficient.
Ready to implement? Use accept-md for a production-ready solution, or build your own using the patterns in this guide.
FAQ
Why not use Puppeteer?
Puppeteer is unnecessary for Next.js since pages are already server-rendered. It adds 1-2 seconds of overhead and 50-100MB of memory per request.
Does this work with SSR and ISR?
Yes! The Accept header approach works with all Next.js rendering strategies: SSG, SSR, and ISR.
Will this affect regular users?
No. Regular users requesting HTML see no difference. Only requests with Accept: text/markdown receive Markdown.
How do I handle authentication?
The handler can forward authentication headers to your page fetch, so protected routes work correctly.
Can I customize the Markdown output?
Yes, you can use Turndown options and custom transformers to customize the output format.