LLM.js — Universal LLM Interface

LLM.js

LLM.js is a zero-dependency library to hundreds of Large Language Models.

It works in Node.js and the browser and supports all the important features for production-ready LLM apps.

await LLM("the color of the sky is"); // blue

Same interface for hundreds of LLMs (OpenAI, Google, Anthropic, Groq, Ollama, xAI, DeepSeek)
Chat using message history
Stream responses instantly (including with thinking, tools, parsers)
Thinking with reasoning models
Tools to call custom functions
Attachments to send images, documents, and other files
Parsers including JSON, XML, codeBlock
Token Usage input and output tokens on every request
Model List for dynamic up-to-date list of latest models
Cost Usage on every request
Options for controlling temperature, max_tokens, …
Abort requests mid-response
TypeScript with clean code
Tests with good coverage
Node.js and Browser supported
Zero-dependencies
MIT license

Why use LLM.js?

Why not just use the OpenAI compability API and switch out baseUrl?

The compability API is not compatible — there are many differences between services
The best features aren’t on the compability API
There’s no support for cost tracking or model features

LLM.js solves all of these and more, letting you focus on building great AI apps.

Install

Install LLM.js from NPM:

npm install @themaximalist/llm.js

Setting up LLM.js is easy.

In Node.js api keys can be detected automatically from the environment.

export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export GOOGLE_API_KEY=...
export GROQ_API_KEY=...
export DEEPSEEK_API_KEY=...
export XAI_API_KEY=...

They can also be included as an option {apiKey: "sk-123"}.

For the browser, keys should be included as an option.

For local models like Ollama, no API key is needed, just ensure an instance is running.

Getting Started

The simplest way to call LLM.js is as an async function that returns a string.

import LLM from "@themaximalist/llm.js"
await LLM("hello"); // Response: hi

This fires a one-off request, and doesn’t store any history.

Chat

Initialize an LLM instance to build up message history for chat.

const llm = new LLM();
await llm.chat("what's the color of the sky in hex value?"); // #87CEEB
await llm.chat("what about at night time?"); // #222d5a

Assistant responses are added automatically.

You can also enable the extended option to return more information about the request.

const response = await LLM("what are the primary colors?", { extended: true });
console.log(response.content);   // "The primary colors are red, blue, and yellow"
console.log(response.usage);     // { input_tokens: 6, output_tokens: 12, total_cost: 0.0001 }
console.log(response.service);   // "ollama" 
console.log(response.messages);  // Full conversation history

Streaming

Streaming provides a better user experience by returning results immediately, and it’s as simple as passing {stream: true} as an option.

const stream = await LLM("the color of the sky is", { stream: true });
for await (const message of stream) {
    process.stdout.write(message);
}

You can also stream with message history.

const llm = new LLM({ stream: false });
llm.system("You are a friendly AI assistant");
const stream = await llm.chat("hello, how are you?", { stream: true });
for await (const chunk of stream) {
  process.stdout.write(chunk);
}

Just like with chat, assistant responses are automatically added with streaming. The extended option works as expected too.

const response = await LLM("tell me a story", { 
  stream: true, 
  extended: true 
});

// Stream the response in real-time
for await (const chunk of response.stream) {
  if (chunk.type === "content") {
    process.stdout.write(chunk.content);
  }
}

After a stream is complete, you can call complete() to get the complete response with metadata, including the final result, token usage, cost, etc…

// Get the complete response with metadata
const complete = await response.complete();
console.log(complete.usage.total_cost); // 0.0023

Thinking

Enable thinking mode for models that can reason through problems step-by-step:

const response = await LLM("solve this math problem: 2x + 5 = 13", { 
  think: true,
});

// thinking automatically enables extended mode

// {
//   thinking: "I need to solve for x. First, I'll subtract 5 from both sides...",
//   content: "x = 4",
//   ...
// }

Thinking also works with streaming for real-time reasoning:

const response = await LLM("explain quantum physics", { 
  think: true,
  stream: true 
});

let thinking = "", content = "";
for await (const chunk of response.stream) {
  if (chunk.type === "thinking") {
    thinking += chunk.content;
    updateThinkingUI(thinking)
  } else if (chunk.type === "content") {
    content += chunk.content;
    updateContentUI(content)
  }
}

// thinking and content are done, can ask for completed response
const complete = await response.complete();
// {
//   thinking: "I need to solve for x. First, I'll subtract 5 from both sides...",
//   content: "x = 4",
//   ...
// }

Tools

Enable LLMs to call custom functions with tool support:

const getCurrentWeather = {
  name: "get_current_weather",
  description: "Get the current weather for a city",
  input_schema: {
    type: "object",
    properties: {
      city: { type: "string", description: "The name of the city" }
    },
    required: ["city"]
  }
};

const response = await LLM("What's the weather in Tokyo?", {
  tools: [getCurrentWeather],
});

// tool use automatically enables extended mode

console.log(response.tool_calls); 
// [{ id: "call_123", name: "get_current_weather", input: { city: "Tokyo" } }]

Tools work with streaming for real-time function calling:

const response = await LLM("What's the weather in Tokyo?", {
  tools: [getCurrentWeather],
  stream: true
});

for await (const chunk of response.stream) {
  if (chunk.type === "tool_calls") {
    console.log("🔧 Tool called:", chunk.content);
  } else if (chunk.type === "content") {
    process.stdout.write(chunk.content); // sometimes LLMs will return content with tool calls
  } else if (chunk.type === "thinking") {
    process.stdout.write(chunk.content); // with `think: true` they'll also return thinking
  }
}

const completed = await response.complete();
console.log("Final result:", completed.tool_calls);

Tool calls are automatically added to message history, making multi-turn tool conversations seamless.

Parsers

LLM.js ships with helpful parsers that work with every LLM:

// JSON Parsing
const colors = await LLM("Please return the primary colors in a JSON array", {
  parser: LLM.parsers.json
});
// ["red", "green", "blue"]

// JSON Mode (automatic JSON formatting + parsing)
const data = await LLM("Return the primary colors as a JSON object", {
  json: true
});
// { colors: ["red", "green", "blue"] }

// Markdown Code Block Parsing  
const story = await LLM("Please return a story wrapped in a Markdown story code block", {
  parser: LLM.parsers.codeBlock("story")
});
// A long time ago...

// XML Parsing
const code = await LLM("Please write HTML and put it inside <WEBSITE></WEBSITE> tags", {
  parser: LLM.parsers.xml("WEBSITE")                       
});
// <html>...

Parsers work seamlessly with streaming, thinking and extended responses.

const response = await LLM("return a JSON object in the form of {color: '...'} containing the color of the sky in english. no other text", {
  stream: true,
  think: true,
  json: true,
  extended: true, // implied automatically from `think: true`
});

for await (const chunk of response.stream) {
  if (chunk.type === "content") {
    process.stdout.write(chunk.content);
  } else if (chunk.type === "thinking") {
    process.stdout.write(chunk.content);
  }
}

const completed = await response.complete();
// { content: { color: "blue" } }

Attachments

Send images, documents, and other files alongside your prompts using attachments:

// Image from base64 data
const data = fs.readFileSync("file.jpg", "base64");
const image = LLM.Attachment.fromJPEG(data);

const response = await LLM("What's in this image?", { attachments: [image] });

Create attachments from different sources:

// From base64 data
const jpeg = LLM.Attachment.fromJPEG(base64Data);
const pdf = LLM.Attachment.fromPDF(base64Data);

// From image URL
const image = LLM.Attachment.fromImageURL("https://example.com/image.jpg");

// Use with chat
const llm = new LLM();
await llm.chat("Describe this image", { attachments: [jpeg] });
await llm.chat("What color is the main object?"); // References previous image

Attachments work seamlessly with streaming:

const response = await LLM("Analyze this document", { 
  attachments: [pdf],
  stream: true 
});

for await (const chunk of response) {
  process.stdout.write(chunk);
}

Note: Attachment support varies by service. Images are widely supported, Documents (PDF) and Images from URLs are supported by some.

Token Usage

Every extended request automatically tracks input and output tokens:

const response = await LLM("explain quantum physics", { extended: true });
console.log(response.usage.input_tokens);  // 3
console.log(response.usage.output_tokens); // 127
console.log(response.usage.total_tokens);  // 130

Token counting works with all features including streaming, thinking, and tools.

const response = await LLM("explain quantum physics", { 
  stream: true,
  extended: true,
});

for await (const chunk of response.stream) {
  // ...
}

const complete = await response.complete();
// {
//   usage: {
//     input_tokens: 3,
//     output_tokens: 127,
//     total_tokens: 130,
//     ...
//   }
// }

Cost Usage

Every extended request automatically tracks cost based on current model pricing:

const response = await LLM("write a haiku", { 
  service: "openai",
  model: "gpt-4o-mini",
  extended: true 
});
// {
//   usage: {
//     input_cost: 0.000045,
//     output_cost: 0.000234,
//     total_cost: 0.000279,
//     ...
//   }
// }

Cost usage works with all features including streaming, thinking, and tools.

const response = await LLM("explain quantum physics", { 
  stream: true,
  extended: true,
});

for await (const chunk of response.stream) {
  // ...
}

const complete = await response.complete();
// {
//   usage: {
//     input_cost: 0.000045,
//     output_cost: 0.000234,
//     total_cost: 0.000279,
//     ...
//   }
// }

Local models (like Ollama) show $0 cost and are marked as local: true.

System Prompts

Tell models to specialize at specific tasks using llm.system(input).

const llm = new LLM();
llm.system("You are a friendly chat bot.");
await llm.chat("what's the color of the sky in hex value?"); // Response: sky blue
await llm.chat("what about at night time?"); // Response: darker value (uses previous context)

Message History

LLM.js supports simple string prompts, but also full message history:

await LLM("hello"); // hi

await LLM([
    { role: "user", content: "remember the secret codeword is blue" },
    { role: "assistant", content: "OK I will remember" },
    { role: "user", content: "what is the secret codeword I just told you?" },
]); // blue

Options

LLM.js provides comprehensive configuration options for all scenarios:

const llm = new LLM(input, {
  service: "openai",        // LLM service provider
  apiKey: "sk-123"          // apiKey
  model: "gpt-4o",          // Specific model
  max_tokens: 1000,         // Maximum response length
  temperature: 0.7,         // "Creativity" (0-2)
  stream: true,             // Enable streaming
  extended: true,           // Extended responses with metadata
  messages: [],             // message history
  think: true,              // Enable thinking mode
  parser: LLM.parsers.json, // Content parser
  tools: [...],             // Available tools
  max_thinking_tokens: 500, // Max tokens for thinking
});

Key Options:

service: Provider (openai, anthropic, google, xai, groq, deepseek, ollama)
apiKey: API key for service, if not specified attempts to read from environment
model: Specific model name (auto-detected from service if not provided)
stream: Enable real-time streaming responses
extended: Return detailed response with usage, costs, and metadata
think: Enable reasoning mode for supported models
temperature: Controls randomness (0 = deterministic, 2 = very creative)
max_tokens: Maximum response length
parser: Transform response content (JSON, XML, codeBlock, etc.)
tools: Functions the model can call

Models

LLM.js handles everything needed to quickly switch between and manage models from difference LLM services.

A single interface to every model
Fetching the latest models
Fetching the latest features and cost data
Quality filtering to return best models

Switch Models

LLM.js supports most popular Large Language Models across both local and remote providers:

// Defaults to Ollama (local)
await LLM("the color of the sky is");

// OpenAI
await LLM("the color of the sky is", { model: "gpt-4o-mini", service: "openai" });

// Anthropic
await LLM("the color of the sky is", { model: "claude-3-5-sonnet-latest", service: "anthropic" });

// Google
await LLM("the color of the sky is", { model: "gemini-1.5-pro", service: "google" });

// xAI
await LLM("the color of the sky is", { service: "xai", model: "grok-beta" });

// DeepSeek with thinking
await LLM("solve this puzzle", { service: "deepseek", model: "deepseek-reasoner", think: true });

// Ollama (local)
await LLM("the color of the sky is", { model: "llama3.2:3b", service: "ollama" });

All features work the same whether local or remote, with automatic token and cost tracking. Local models track token usage, but cost is always $0.

Fetch Latest Models

Get the latest available models directly from providers:

const llm = new LLM({ service: "openai" });
const models = await llm.fetchModels();

console.log(models.length); // 50+ models
console.log(models[0]);     // { name: "gpt-4o", created: Date, service: "openai", ... }

Here’s an example of the models available with the Quality Filter.

anthropic/claude-opus-4-20250514

anthropic/claude-sonnet-4-20250514

anthropic/claude-3-7-sonnet-20250219

anthropic/claude-3-5-sonnet-20241022

anthropic/claude-3-5-haiku-20241022

anthropic/claude-3-5-sonnet-20240620

anthropic/claude-3-haiku-20240307

anthropic/claude-3-opus-20240229

anthropic/claude-3-sonnet-20240229

ollama/deepseek-r1:8b

ollama/mistral-small:latest

ollama/phi4:latest

ollama/llama3.2-vision:latest

ollama/llava-llama3:latest

ollama/bakllava:latest

ollama/minicpm-v:latest

ollama/llava:latest

ollama/llava-phi3:latest

ollama/moondream:latest

ollama/granite3.2-vision:latest

ollama/phi4-mini:latest

ollama/gemma3:12b

ollama/gemma3:4b

ollama/llama3.3:latest

ollama/llama3.2:latest

ollama/llama3.2:1b

ollama/llama3-gradient:latest

ollama/llama3:latest

ollama/llama2:7b

ollama/llama2:latest

openai/gpt-4-0613

openai/gpt-4

openai/gpt-3.5-turbo

openai/gpt-4.1-nano

openai/gpt-4-1106-preview

openai/gpt-3.5-turbo-1106

openai/gpt-4-0125-preview

openai/gpt-4-turbo-preview

openai/gpt-3.5-turbo-0125

openai/gpt-4-turbo

openai/gpt-4-turbo-2024-04-09

openai/gpt-4o

openai/gpt-4o-2024-05-13

openai/gpt-4o-mini-2024-07-18

openai/gpt-4o-mini

openai/gpt-4o-2024-08-06

openai/chatgpt-4o-latest

openai/o1-preview-2024-09-12

openai/o1-preview

openai/o1-mini-2024-09-12

openai/o1-mini

openai/o1-2024-12-17

openai/o1

openai/o3-mini

openai/o3-mini-2025-01-31

openai/gpt-4o-2024-11-20

openai/gpt-4.5-preview

openai/gpt-4.5-preview-2025-02-27

openai/gpt-4o-search-preview-2025-03-11

openai/gpt-4o-search-preview

openai/gpt-4o-mini-search-preview-2025-03-11

openai/gpt-4o-mini-search-preview

openai/o1-pro-2025-03-19

openai/o1-pro

openai/o3-2025-04-16

openai/o4-mini-2025-04-16

openai/o3

openai/o4-mini

openai/gpt-4.1-2025-04-14

openai/gpt-4.1

openai/gpt-4.1-mini-2025-04-14

openai/gpt-4.1-mini

openai/gpt-4.1-nano-2025-04-14

openai/gpt-3.5-turbo-16k

google/gemini-2.5-pro-exp-03-25

google/gemini-2.5-pro-preview-03-25

google/gemini-2.5-flash-preview-04-17

google/gemini-2.5-flash-preview-05-20

google/gemini-2.5-flash-preview-04-17-thinking

google/gemini-2.5-pro-preview-05-06

google/gemini-2.5-pro-preview-06-05

google/gemini-2.0-flash-exp

google/gemini-2.0-flash

google/gemini-2.0-flash-001

google/gemini-2.0-flash-lite-001

google/gemini-2.0-flash-lite

google/gemini-2.0-flash-lite-preview-02-05

google/gemini-2.0-flash-lite-preview

google/gemini-2.0-pro-exp

google/gemini-2.0-pro-exp-02-05

google/gemini-exp-1206

google/gemini-2.0-flash-thinking-exp-01-21

google/gemini-2.0-flash-thinking-exp

google/gemini-2.0-flash-thinking-exp-1219

google/gemini-2.5-flash-preview-tts

google/gemini-2.5-pro-preview-tts

xai/grok-2-1212

xai/grok-3

xai/grok-3-fast

xai/grok-3-mini

xai/grok-3-mini-fast

groq/llama3-8b-8192

groq/compound-beta

groq/llama-3.1-8b-instant

groq/qwen-qwq-32b

groq/gemma2-9b-it

groq/deepseek-r1-distill-llama-70b

groq/allam-2-7b

groq/meta-llama/llama-4-maverick-17b-128e-instruct

groq/meta-llama/llama-prompt-guard-2-86m

groq/llama3-70b-8192

groq/llama-guard-3-8b

groq/meta-llama/llama-prompt-guard-2-22m

groq/compound-beta-mini

groq/meta-llama/llama-guard-4-12b

groq/meta-llama/llama-4-scout-17b-16e-instruct

groq/mistral-saba-24b

groq/llama-3.3-70b-versatile

deepseek/deepseek-chat

deepseek/deepseek-reasoner

Model Features and Cost

LLM.js combines the fetched models from each provider, with the feature and cost list from LiteLLM.

This provides real-time cost per input/output token, and model features like context window, tool support, thinking support, and more!

import { ModelUsage } from "@themaximalist/llm.js";

// Get all cached models
const allModels = ModelUsage.getAll();
console.log(allModels.length); // 100+

// Refresh from latest sources  
const refreshedModels = await ModelUsage.refresh();
console.log(refreshedModels.length); // Even more models

// Get specific model info
const gpt4 = ModelUsage.get("openai", "gpt-4o");
console.log(gpt4.input_cost_per_token);  // 0.0000025
console.log(gpt4.max_input_tokens);      // 128000

When using the extended option — token usage and cost are automatically added to responses.

Quality Models

The model APIs return every model supported by the platform. If you need to present these to users — it’s a mess.

The Quality Models filter out things like embeddings, tts, instruct, audio, image, etc… models to only present the best LLM models.

const llm = new LLM({ service: "anthropic" });
const qualityModels = await llm.getQualityModels();

for (const model of qualityModels) {
  console.log(model.model);                 // "claude-3-5-sonnet-latest"
  console.log(model.input_cost_per_token);  // 0.000003
  console.log(model.output_cost_per_token); // 0.000015
  console.log(model.max_tokens);            // 8192
  console.log(model.created);               // 2024-10-22T00:00:00.000Z
}

Custom Models

If the refreshed model list doesn’t have a model you need, or you have a custom model — you can add custom token and pricing information.

import { ModelUsage } from "@themaximalist/llm.js";

ModelUsage.addCustom({
  model: "my-custom-gpt",
  service: "openai", 
  input_cost_per_token: 0.00001,
  output_cost_per_token: 0.00003,
  max_tokens: 4096
});

// Now use it like any other model
const response = await LLM("hello", { 
  service: "openai", 
  model: "my-custom-gpt",
  extended: true 
});
console.log(response.usage.total_cost); // Uses your custom pricing

Custom Services

You can add custom services to LLM.js by passing a custom object:

const llm = new LLM({
    service: "together",
    baseUrl: "https://api.together.xyz/v1",
    model: "meta-llama/Llama-3-70b-chat-hf",
    apiKey,
});

You can also create a custom service by extending the LLM.APIv1 class:

class Together extends LLM.APIv1 {
    static readonly service: ServiceName = "together";
    static DEFAULT_BASE_URL: string = "https://api.together.xyz/v1";
    static DEFAULT_MODEL: string = "meta-llama/Llama-3-70b-chat-hf";
}

const llm = new Together();

You can even register the custom services with LLM.js to make them available globally:


LLM.register(Together);
const llm = new LLM({ service: "together" });

To implement a fully custom model, subclass LLM and implement the parse methods:

class Custom extends LLM {
    static readonly service: ServiceName = "secretAGI";
    static DEFAULT_BASE_URL: string = "http://localhost:9876";
    static DEFAULT_MODEL: string = "gpt-999";
    static isLocal: boolean = false; // don't track pricing
    static isBearerAuth: boolean = false;

    get chatUrl() { return `${this.baseUrl}/chat` }
    get modelsUrl() { return `${this.baseUrl}/models` }

    parseContent(data: any): string { ... }
    parseTools(data: any): ToolCall[] { ... }
    parseThinking(data: any): string { ... }
    parseModel(model: any): Model { ... }
    parseOptions(options: Options): Options { ... }
    parseTokenUsage(usage: any): InputOutputTokens | null { ... }
    parseUsage(tokenUsage: InputOutputTokens): Usage { ... }

    // streaming methods
    parseToolsChunk(chunk: any): ToolCall[] { return this.parseTools(chunk) }
    parseContentChunk(chunk: any): string { return this.parseContent(chunk) }
    parseThinkingChunk(chunk: any): string { return this.parseThinking(chunk) }
}

Connection Verification

Test your setup and API keys with built-in connection verification:

const llm = new LLM({ service: "openai" });
const isConnected = await llm.verifyConnection();
console.log(isConnected); // true if API key and service work

This is a light check that doesn’t perform a LLM chat response. For non-local services it detects if it can fetch models. For local services it detects if an instance is up and running.

Examples

The test suite contains comprehensive examples of all features.

API Reference

See the full API reference.

Debug

LLM.js uses a debug like logging system with the llm.js namespace.

View debug logs by setting the DEBUG environment variable:

> DEBUG=llm.js* node your-script.js
# debug logs
blue

Projects

LLM.js is currently used in production by:

Infinity Arcade — play any text adventure game
News Score — score and sort the news
AI Image Explorer — image explorer
Think Machine — AI research assistant
Thinkable Type — Information Architecture Language
Minds App — AI chat in your menubar

Changelog

06/22/2025 — v1.0.1 — Attachment support (images and PDF), Better model features support and tags
06/13/2025 — v1.0.0 — Added thinking mode, extended responses, token/cost usage, model management, TypeScript. Removed Together, Perplexity, Llamafile
01/27/2025 — v0.8.0 — Added DeepSeek
12/19/2024 — v0.7.1 — Fixed Anthropic streaming bug
10/25/2024 — v0.7.0 — Added Perplexity, upgraded all models to latest
04/24/2024 — v0.6.6 — Added browser support
04/18/2024 — v0.6.5 — Added Llama 3 and Together
03/25/2024 — v0.6.4 — Added Groq and abort()
03/17/2024 — v0.6.3 — Added JSON/XML/Markdown parsers and a stream handler
03/15/2024 — v0.6.2 — Fix bug with Google streaming
03/15/2024 — v0.6.1 — Fix bug to not add empty responses
03/04/2024 — v0.6.0 — Added Anthropic Claude 3
03/02/2024 — v0.5.9 — Added Ollama
02/15/2024 — v0.5.4 — Added Google Gemini
02/13/2024 — v0.5.3 — Added Mistral
01/15/2024 — v0.5.0 — Created website
01/12/2024 — v0.4.7 — OpenAI Tools, JSON stream
01/07/2024 — v0.3.5 — Added ModelDeployer
01/05/2024 — v0.3.2 — Added Llamafile
04/26/2023 — v0.2.5 — Added Anthropic, CLI
04/24/2023 — v0.2.4 — Chat options
04/23/2023 — v0.2.2 — Unified LLM() interface, streaming
04/22/2023 — v0.1.2 — Docs, system prompt
04/21/2023 — v0.0.1 — Created LLM.js with OpenAI support

License

MIT

Author

Created by Brad Jasper, a product developer working on AI-powered apps and tools.

Need help with your LLM project? I’m available for consulting on web, desktop, mobile, and AI development. Get in touch →