Fork me on GitHub

LLM.js — Universal LLM Interface

LLM.js

GitHub Repo stars NPM Downloads GitHub code size in bytes GitHub License


LLM.js is a zero-dependency library to hundreds of Large Language Models.

It works in Node.js and the browser and supports all the important features for production-ready LLM apps.

await LLM("the color of the sky is"); // blue

Why use LLM.js?

Why not just use the OpenAI compability API and switch out baseUrl?

  1. The compability API is not compatible — there are many differences between services
  2. The best features aren’t on the compability API
  3. There’s no support for cost tracking or model features

LLM.js solves all of these and more, letting you focus on building great AI apps.

Install

Install LLM.js from NPM:

npm install @themaximalist/llm.js

Setting up LLM.js is easy.

In Node.js api keys can be detected automatically from the environment.

export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export GOOGLE_API_KEY=...
export GROQ_API_KEY=...
export DEEPSEEK_API_KEY=...
export XAI_API_KEY=...

They can also be included as an option {apiKey: "sk-123"}.

For the browser, keys should be included as an option.

For local models like Ollama, no API key is needed, just ensure an instance is running.

Getting Started

The simplest way to call LLM.js is as an async function that returns a string.

import LLM from "@themaximalist/llm.js"
await LLM("hello"); // Response: hi

This fires a one-off request, and doesn’t store any history.

Chat

Initialize an LLM instance to build up message history for chat.

const llm = new LLM();
await llm.chat("what's the color of the sky in hex value?"); // #87CEEB
await llm.chat("what about at night time?"); // #222d5a

Assistant responses are added automatically.

You can also enable the extended option to return more information about the request.

const response = await LLM("what are the primary colors?", { extended: true });
console.log(response.content);   // "The primary colors are red, blue, and yellow"
console.log(response.usage);     // { input_tokens: 6, output_tokens: 12, total_cost: 0.0001 }
console.log(response.service);   // "ollama" 
console.log(response.messages);  // Full conversation history

Streaming

Streaming provides a better user experience by returning results immediately, and it’s as simple as passing {stream: true} as an option.

const stream = await LLM("the color of the sky is", { stream: true });
for await (const message of stream) {
    process.stdout.write(message);
}

You can also stream with message history.

const llm = new LLM({ stream: false });
llm.system("You are a friendly AI assistant");
const stream = await llm.chat("hello, how are you?", { stream: true });
for await (const chunk of stream) {
  process.stdout.write(chunk);
}

Just like with chat, assistant responses are automatically added with streaming. The extended option works as expected too.

const response = await LLM("tell me a story", { 
  stream: true, 
  extended: true 
});

// Stream the response in real-time
for await (const chunk of response.stream) {
  if (chunk.type === "content") {
    process.stdout.write(chunk.content);
  }
}

After a stream is complete, you can call complete() to get the complete response with metadata, including the final result, token usage, cost, etc…

// Get the complete response with metadata
const complete = await response.complete();
console.log(complete.usage.total_cost); // 0.0023

Thinking

Enable thinking mode for models that can reason through problems step-by-step:

const response = await LLM("solve this math problem: 2x + 5 = 13", { 
  think: true,
});

// thinking automatically enables extended mode

// {
//   thinking: "I need to solve for x. First, I'll subtract 5 from both sides...",
//   content: "x = 4",
//   ...
// }

Thinking also works with streaming for real-time reasoning:

const response = await LLM("explain quantum physics", { 
  think: true,
  stream: true 
});

let thinking = "", content = "";
for await (const chunk of response.stream) {
  if (chunk.type === "thinking") {
    thinking += chunk.content;
    updateThinkingUI(thinking)
  } else if (chunk.type === "content") {
    content += chunk.content;
    updateContentUI(content)
  }
}

// thinking and content are done, can ask for completed response
const complete = await response.complete();
// {
//   thinking: "I need to solve for x. First, I'll subtract 5 from both sides...",
//   content: "x = 4",
//   ...
// }

Tools

Enable LLMs to call custom functions with tool support:

const getCurrentWeather = {
  name: "get_current_weather",
  description: "Get the current weather for a city",
  input_schema: {
    type: "object",
    properties: {
      city: { type: "string", description: "The name of the city" }
    },
    required: ["city"]
  }
};

const response = await LLM("What's the weather in Tokyo?", {
  tools: [getCurrentWeather],
});

// tool use automatically enables extended mode

console.log(response.tool_calls); 
// [{ id: "call_123", name: "get_current_weather", input: { city: "Tokyo" } }]

Tools work with streaming for real-time function calling:

const response = await LLM("What's the weather in Tokyo?", {
  tools: [getCurrentWeather],
  stream: true
});

for await (const chunk of response.stream) {
  if (chunk.type === "tool_calls") {
    console.log("🔧 Tool called:", chunk.content);
  } else if (chunk.type === "content") {
    process.stdout.write(chunk.content); // sometimes LLMs will return content with tool calls
  } else if (chunk.type === "thinking") {
    process.stdout.write(chunk.content); // with `think: true` they'll also return thinking
  }
}

const completed = await response.complete();
console.log("Final result:", completed.tool_calls);

Tool calls are automatically added to message history, making multi-turn tool conversations seamless.

Parsers

LLM.js ships with helpful parsers that work with every LLM:

// JSON Parsing
const colors = await LLM("Please return the primary colors in a JSON array", {
  parser: LLM.parsers.json
});
// ["red", "green", "blue"]

// JSON Mode (automatic JSON formatting + parsing)
const data = await LLM("Return the primary colors as a JSON object", {
  json: true
});
// { colors: ["red", "green", "blue"] }

// Markdown Code Block Parsing  
const story = await LLM("Please return a story wrapped in a Markdown story code block", {
  parser: LLM.parsers.codeBlock("story")
});
// A long time ago...

// XML Parsing
const code = await LLM("Please write HTML and put it inside <WEBSITE></WEBSITE> tags", {
  parser: LLM.parsers.xml("WEBSITE")                       
});
// <html>...

Parsers work seamlessly with streaming, thinking and extended responses.

const response = await LLM("return a JSON object in the form of {color: '...'} containing the color of the sky in english. no other text", {
  stream: true,
  think: true,
  json: true,
  extended: true, // implied automatically from `think: true`
});

for await (const chunk of response.stream) {
  if (chunk.type === "content") {
    process.stdout.write(chunk.content);
  } else if (chunk.type === "thinking") {
    process.stdout.write(chunk.content);
  }
}

const completed = await response.complete();
// { content: { color: "blue" } }

Attachments

Send images, documents, and other files alongside your prompts using attachments:

// Image from base64 data
const data = fs.readFileSync("file.jpg", "base64");
const image = LLM.Attachment.fromJPEG(data);

const response = await LLM("What's in this image?", { attachments: [image] });

Create attachments from different sources:

// From base64 data
const jpeg = LLM.Attachment.fromJPEG(base64Data);
const pdf = LLM.Attachment.fromPDF(base64Data);

// From image URL
const image = LLM.Attachment.fromImageURL("https://example.com/image.jpg");

// Use with chat
const llm = new LLM();
await llm.chat("Describe this image", { attachments: [jpeg] });
await llm.chat("What color is the main object?"); // References previous image

Attachments work seamlessly with streaming:

const response = await LLM("Analyze this document", { 
  attachments: [pdf],
  stream: true 
});

for await (const chunk of response) {
  process.stdout.write(chunk);
}

Note: Attachment support varies by service. Images are widely supported, Documents (PDF) and Images from URLs are supported by some.

Token Usage

Every extended request automatically tracks input and output tokens:

const response = await LLM("explain quantum physics", { extended: true });
console.log(response.usage.input_tokens);  // 3
console.log(response.usage.output_tokens); // 127
console.log(response.usage.total_tokens);  // 130

Token counting works with all features including streaming, thinking, and tools.

const response = await LLM("explain quantum physics", { 
  stream: true,
  extended: true,
});

for await (const chunk of response.stream) {
  // ...
}

const complete = await response.complete();
// {
//   usage: {
//     input_tokens: 3,
//     output_tokens: 127,
//     total_tokens: 130,
//     ...
//   }
// }

Cost Usage

Every extended request automatically tracks cost based on current model pricing:

const response = await LLM("write a haiku", { 
  service: "openai",
  model: "gpt-4o-mini",
  extended: true 
});
// {
//   usage: {
//     input_cost: 0.000045,
//     output_cost: 0.000234,
//     total_cost: 0.000279,
//     ...
//   }
// }

Cost usage works with all features including streaming, thinking, and tools.

const response = await LLM("explain quantum physics", { 
  stream: true,
  extended: true,
});

for await (const chunk of response.stream) {
  // ...
}

const complete = await response.complete();
// {
//   usage: {
//     input_cost: 0.000045,
//     output_cost: 0.000234,
//     total_cost: 0.000279,
//     ...
//   }
// }

Local models (like Ollama) show $0 cost and are marked as local: true.

System Prompts

Tell models to specialize at specific tasks using llm.system(input).

const llm = new LLM();
llm.system("You are a friendly chat bot.");
await llm.chat("what's the color of the sky in hex value?"); // Response: sky blue
await llm.chat("what about at night time?"); // Response: darker value (uses previous context)

Message History

LLM.js supports simple string prompts, but also full message history:

await LLM("hello"); // hi

await LLM([
    { role: "user", content: "remember the secret codeword is blue" },
    { role: "assistant", content: "OK I will remember" },
    { role: "user", content: "what is the secret codeword I just told you?" },
]); // blue

Options

LLM.js provides comprehensive configuration options for all scenarios:

const llm = new LLM(input, {
  service: "openai",        // LLM service provider
  apiKey: "sk-123"          // apiKey
  model: "gpt-4o",          // Specific model
  max_tokens: 1000,         // Maximum response length
  temperature: 0.7,         // "Creativity" (0-2)
  stream: true,             // Enable streaming
  extended: true,           // Extended responses with metadata
  messages: [],             // message history
  think: true,              // Enable thinking mode
  parser: LLM.parsers.json, // Content parser
  tools: [...],             // Available tools
  max_thinking_tokens: 500, // Max tokens for thinking
});

Key Options:

Models

LLM.js handles everything needed to quickly switch between and manage models from difference LLM services.

Switch Models

LLM.js supports most popular Large Language Models across both local and remote providers:

// Defaults to Ollama (local)
await LLM("the color of the sky is");

// OpenAI
await LLM("the color of the sky is", { model: "gpt-4o-mini", service: "openai" });

// Anthropic
await LLM("the color of the sky is", { model: "claude-3-5-sonnet-latest", service: "anthropic" });

// Google
await LLM("the color of the sky is", { model: "gemini-1.5-pro", service: "google" });

// xAI
await LLM("the color of the sky is", { service: "xai", model: "grok-beta" });

// DeepSeek with thinking
await LLM("solve this puzzle", { service: "deepseek", model: "deepseek-reasoner", think: true });

// Ollama (local)
await LLM("the color of the sky is", { model: "llama3.2:3b", service: "ollama" });

All features work the same whether local or remote, with automatic token and cost tracking. Local models track token usage, but cost is always $0.

Fetch Latest Models

Get the latest available models directly from providers:

const llm = new LLM({ service: "openai" });
const models = await llm.fetchModels();

console.log(models.length); // 50+ models
console.log(models[0]);     // { name: "gpt-4o", created: Date, service: "openai", ... }

Here’s an example of the models available with the Quality Filter.

anthropic/claude-opus-4-20250514
anthropic/claude-sonnet-4-20250514
anthropic/claude-3-7-sonnet-20250219
anthropic/claude-3-5-sonnet-20241022
anthropic/claude-3-5-haiku-20241022
anthropic/claude-3-5-sonnet-20240620
anthropic/claude-3-haiku-20240307
anthropic/claude-3-opus-20240229
anthropic/claude-3-sonnet-20240229
ollama/deepseek-r1:8b
ollama/mistral-small:latest
ollama/phi4:latest
ollama/llama3.2-vision:latest
ollama/llava-llama3:latest
ollama/bakllava:latest
ollama/minicpm-v:latest
ollama/llava:latest
ollama/llava-phi3:latest
ollama/moondream:latest
ollama/granite3.2-vision:latest
ollama/phi4-mini:latest
ollama/gemma3:12b
ollama/gemma3:4b
ollama/llama3.3:latest
ollama/llama3.2:latest
ollama/llama3.2:1b
ollama/llama3-gradient:latest
ollama/llama3:latest
ollama/llama2:7b
ollama/llama2:latest
openai/gpt-4-0613
openai/gpt-4
openai/gpt-3.5-turbo
openai/gpt-4.1-nano
openai/gpt-4-1106-preview
openai/gpt-3.5-turbo-1106
openai/gpt-4-0125-preview
openai/gpt-4-turbo-preview
openai/gpt-3.5-turbo-0125
openai/gpt-4-turbo
openai/gpt-4-turbo-2024-04-09
openai/gpt-4o
openai/gpt-4o-2024-05-13
openai/gpt-4o-mini-2024-07-18
openai/gpt-4o-mini
openai/gpt-4o-2024-08-06
openai/chatgpt-4o-latest
openai/o1-preview-2024-09-12
openai/o1-preview
openai/o1-mini-2024-09-12
openai/o1-mini
openai/o1-2024-12-17
openai/o1
openai/o3-mini
openai/o3-mini-2025-01-31
openai/gpt-4o-2024-11-20
openai/gpt-4.5-preview
openai/gpt-4.5-preview-2025-02-27
openai/gpt-4o-search-preview-2025-03-11
openai/gpt-4o-search-preview
openai/gpt-4o-mini-search-preview-2025-03-11
openai/gpt-4o-mini-search-preview
openai/o1-pro-2025-03-19
openai/o1-pro
openai/o3-2025-04-16
openai/o4-mini-2025-04-16
openai/o3
openai/o4-mini
openai/gpt-4.1-2025-04-14
openai/gpt-4.1
openai/gpt-4.1-mini-2025-04-14
openai/gpt-4.1-mini
openai/gpt-4.1-nano-2025-04-14
openai/gpt-3.5-turbo-16k
google/gemini-2.5-pro-exp-03-25
google/gemini-2.5-pro-preview-03-25
google/gemini-2.5-flash-preview-04-17
google/gemini-2.5-flash-preview-05-20
google/gemini-2.5-flash-preview-04-17-thinking
google/gemini-2.5-pro-preview-05-06
google/gemini-2.5-pro-preview-06-05
google/gemini-2.0-flash-exp
google/gemini-2.0-flash
google/gemini-2.0-flash-001
google/gemini-2.0-flash-lite-001
google/gemini-2.0-flash-lite
google/gemini-2.0-flash-lite-preview-02-05
google/gemini-2.0-flash-lite-preview
google/gemini-2.0-pro-exp
google/gemini-2.0-pro-exp-02-05
google/gemini-exp-1206
google/gemini-2.0-flash-thinking-exp-01-21
google/gemini-2.0-flash-thinking-exp
google/gemini-2.0-flash-thinking-exp-1219
google/gemini-2.5-flash-preview-tts
google/gemini-2.5-pro-preview-tts
xai/grok-2-1212
xai/grok-3
xai/grok-3-fast
xai/grok-3-mini
xai/grok-3-mini-fast
groq/llama3-8b-8192
groq/compound-beta
groq/llama-3.1-8b-instant
groq/qwen-qwq-32b
groq/gemma2-9b-it
groq/deepseek-r1-distill-llama-70b
groq/allam-2-7b
groq/meta-llama/llama-4-maverick-17b-128e-instruct
groq/meta-llama/llama-prompt-guard-2-86m
groq/llama3-70b-8192
groq/llama-guard-3-8b
groq/meta-llama/llama-prompt-guard-2-22m
groq/compound-beta-mini
groq/meta-llama/llama-guard-4-12b
groq/meta-llama/llama-4-scout-17b-16e-instruct
groq/mistral-saba-24b
groq/llama-3.3-70b-versatile
deepseek/deepseek-chat
deepseek/deepseek-reasoner

Model Features and Cost

LLM.js combines the fetched models from each provider, with the feature and cost list from LiteLLM.

This provides real-time cost per input/output token, and model features like context window, tool support, thinking support, and more!

import { ModelUsage } from "@themaximalist/llm.js";

// Get all cached models
const allModels = ModelUsage.getAll();
console.log(allModels.length); // 100+

// Refresh from latest sources  
const refreshedModels = await ModelUsage.refresh();
console.log(refreshedModels.length); // Even more models

// Get specific model info
const gpt4 = ModelUsage.get("openai", "gpt-4o");
console.log(gpt4.input_cost_per_token);  // 0.0000025
console.log(gpt4.max_input_tokens);      // 128000

When using the extended option — token usage and cost are automatically added to responses.

Quality Models

The model APIs return every model supported by the platform. If you need to present these to users — it’s a mess.

The Quality Models filter out things like embeddings, tts, instruct, audio, image, etc… models to only present the best LLM models.

const llm = new LLM({ service: "anthropic" });
const qualityModels = await llm.getQualityModels();

for (const model of qualityModels) {
  console.log(model.model);                 // "claude-3-5-sonnet-latest"
  console.log(model.input_cost_per_token);  // 0.000003
  console.log(model.output_cost_per_token); // 0.000015
  console.log(model.max_tokens);            // 8192
  console.log(model.created);               // 2024-10-22T00:00:00.000Z
}

Custom Models

If the refreshed model list doesn’t have a model you need, or you have a custom model — you can add custom token and pricing information.

import { ModelUsage } from "@themaximalist/llm.js";

ModelUsage.addCustom({
  model: "my-custom-gpt",
  service: "openai", 
  input_cost_per_token: 0.00001,
  output_cost_per_token: 0.00003,
  max_tokens: 4096
});

// Now use it like any other model
const response = await LLM("hello", { 
  service: "openai", 
  model: "my-custom-gpt",
  extended: true 
});
console.log(response.usage.total_cost); // Uses your custom pricing

Custom Services

You can add custom services to LLM.js by passing a custom object:

const llm = new LLM({
    service: "together",
    baseUrl: "https://api.together.xyz/v1",
    model: "meta-llama/Llama-3-70b-chat-hf",
    apiKey,
});

You can also create a custom service by extending the LLM.APIv1 class:

class Together extends LLM.APIv1 {
    static readonly service: ServiceName = "together";
    static DEFAULT_BASE_URL: string = "https://api.together.xyz/v1";
    static DEFAULT_MODEL: string = "meta-llama/Llama-3-70b-chat-hf";
}

const llm = new Together();

You can even register the custom services with LLM.js to make them available globally:


LLM.register(Together);
const llm = new LLM({ service: "together" });

To implement a fully custom model, subclass LLM and implement the parse methods:

class Custom extends LLM {
    static readonly service: ServiceName = "secretAGI";
    static DEFAULT_BASE_URL: string = "http://localhost:9876";
    static DEFAULT_MODEL: string = "gpt-999";
    static isLocal: boolean = false; // don't track pricing
    static isBearerAuth: boolean = false;

    get chatUrl() { return `${this.baseUrl}/chat` }
    get modelsUrl() { return `${this.baseUrl}/models` }

    parseContent(data: any): string { ... }
    parseTools(data: any): ToolCall[] { ... }
    parseThinking(data: any): string { ... }
    parseModel(model: any): Model { ... }
    parseOptions(options: Options): Options { ... }
    parseTokenUsage(usage: any): InputOutputTokens | null { ... }
    parseUsage(tokenUsage: InputOutputTokens): Usage { ... }

    // streaming methods
    parseToolsChunk(chunk: any): ToolCall[] { return this.parseTools(chunk) }
    parseContentChunk(chunk: any): string { return this.parseContent(chunk) }
    parseThinkingChunk(chunk: any): string { return this.parseThinking(chunk) }
}

Connection Verification

Test your setup and API keys with built-in connection verification:

const llm = new LLM({ service: "openai" });
const isConnected = await llm.verifyConnection();
console.log(isConnected); // true if API key and service work

This is a light check that doesn’t perform a LLM chat response. For non-local services it detects if it can fetch models. For local services it detects if an instance is up and running.

Examples

The test suite contains comprehensive examples of all features.

API Reference

See the full API reference.

Debug

LLM.js uses a debug like logging system with the llm.js namespace.

View debug logs by setting the DEBUG environment variable:

> DEBUG=llm.js* node your-script.js
# debug logs
blue

Projects

LLM.js is currently used in production by:

Changelog

License

MIT

Author

Created by Brad Jasper, a product developer working on AI-powered apps and tools.

Need help with your LLM project? I’m available for consulting on web, desktop, mobile, and AI development. Get in touch →