markdown-for-agents

Getting Started

Install markdown-for-agents and learn common usage patterns.

Getting Started

This guide walks you through installing markdown-for-agents and using it in common scenarios. Want to try it first? Open the playground to convert HTML to Markdown right in your browser.

Installation

npm install markdown-for-agents

The library has a single runtime dependency (htmlparser2) and works in any JavaScript environment that supports ES2022.

Python

Also available as a pure Python package with zero dependencies. See the Python package docs for installation and usage.

Your First Conversion

import { convert } from 'markdown-for-agents';

const { markdown } = convert('<h1>Hello</h1><p>World</p>');
console.log(markdown);
// # Hello
//
// World

The convert function takes an HTML string and returns an object with:

  • markdown - the converted Markdown string
  • tokenEstimate - a rough token/character/word count

Converting a Web Page

To convert a fetched web page and strip away navigation, ads, and boilerplate:

import { convert } from 'markdown-for-agents';

const response = await fetch('https://example.com/article');
const html = await response.text();

const { markdown, tokenEstimate } = convert(html, {
    extract: true,
    baseUrl: 'https://example.com'
});

console.log(markdown);
console.log(`~${tokenEstimate.tokens} tokens`);

The extract: true option strips non-content elements (nav, footer, ads, etc.) and baseUrl resolves relative links and images to absolute URLs.

Common Patterns

Converting an HTML Fragment

For HTML fragments without a full page structure, no extraction is needed:

const { markdown } = convert(`
  <h2>Features</h2>
  <ul>
    <li>Fast</li>
    <li>Lightweight</li>
    <li>Universal</li>
  </ul>
`);
// ## Features
//
// - Fast
// - Lightweight
// - Universal

Customizing Markdown Output

Control the output style with options:

const { markdown } = convert(html, {
    headingStyle: 'setext', // Title\n=====
    bulletChar: '*', // * list items
    fenceChar: '~', // ~~~ code blocks
    strongDelimiter: '__', // __bold__
    emDelimiter: '_' // _italic_
});

Frontmatter

By default, metadata from the HTML <head> is extracted and prepended as YAML frontmatter:

const { markdown } = convert('<html><head><title>My Page</title></head><body><p>Hello</p></body></html>');
// ---
// title: My Page
// ---
//
// Hello

Disable it with frontmatter: false, or merge custom fields with frontmatter: { author: 'Jane' }. See the Frontmatter guide for details.

Using Custom Rules

Override how specific elements are converted:

import { convert, createRule } from 'markdown-for-agents';

const { markdown } = convert(html, {
    rules: [
        // Convert <details> to a blockquote
        createRule('details', ({ convertChildren, node }) => {
            const content = convertChildren(node).trim();
            return `\n\n> ${content}\n\n`;
        })
    ]
});

See the Custom Rules guide for the full rule API.

Serving Markdown via Middleware

If you're building a web server, you can automatically respond with Markdown when AI agents request it. Each middleware is a separate package - install only what you need:

// Express
import express from 'express';
import { markdown } from '@markdown-for-agents/express';

const app = express();
app.use(markdown({ extract: true }));
app.get('/article', (req, res) => {
    res.send('<h1>Title</h1><p>Content...</p>');
});
// Fastify
import Fastify from 'fastify';
import { markdown } from '@markdown-for-agents/fastify';

const fastify = Fastify();
fastify.register(markdown({ extract: true }));
// Hono
import { Hono } from 'hono';
import { markdown } from '@markdown-for-agents/hono';

const app = new Hono();
app.use(markdown({ extract: true }));
// Next.js (route handler)
import { withMarkdown } from '@markdown-for-agents/nextjs';

function handler() {
    return new Response('<h1>Title</h1><p>Content...</p>', {
        headers: { 'content-type': 'text/html' }
    });
}

export const GET = withMarkdown(handler, { extract: true });

When a client sends Accept: text/markdown, the response is automatically converted. Normal requests pass through untouched. See the Middleware guide for all framework integrations, or the Next.js example for a complete working app with the proxy pattern.

Token Estimation

Every conversion returns a token estimate for LLM cost planning. The built-in heuristic works out of the box, or plug in an exact tokenizer:

const { tokenEstimate } = convert(html);
console.log(tokenEstimate);
// { tokens: 12, characters: 46, words: 8 }

Deduplication

Real-world pages often repeat content (nav links, CTAs, footers). Enable deduplication to remove repeated blocks:

const { markdown } = convert(html, { deduplicate: true });

Server Timing

Measure conversion performance with the serverTiming option. Middleware adapters surface this as a Server-Timing header:

const { markdown, convertDuration } = convert(html, { serverTiming: true });
console.log(`Took ${convertDuration}ms`);

Content-Signal Header

Signal publisher consent for AI usage via an HTTP header. Only set when explicitly configured:

app.use(markdown({
    contentSignal: { aiTrain: true, search: true, aiInput: true }
}));
// Sets: content-signal: ai-train=yes, search=yes, ai-input=yes

See Advanced Options for full details on all of the above.

What's Next

  • Playground - try the converter interactively with any URL or HTML
  • Content Extraction - fine-tune what gets stripped from web pages
  • Frontmatter - control the YAML metadata prepended to output
  • Custom Rules - extend the converter with your own element handlers
  • Middleware - integrate with Express, Fastify, Hono, Next.js, or any Web Standard server
  • Supported Elements - full reference of HTML-to-Markdown mappings
  • Advanced Options - custom token counters, deduplication, server timing, and content-signal headers
  • API Reference - complete API documentation

On this page