newMCP server for Claude, Cursor & agents

The video runtime
for agents.

A GPU-rendered video runtime that turns a JSON timeline into a finished MP4. Built for LLMs, agents, and the apps that ship them. Powered by the open Clipkit Protocol.

Open source/Powered by the Clipkit Protocol/Pay only for hosted renders
project.json · preview.mp4
rendering · t=00.00s
{
"width": 1080,
"height": 1920,
"duration": 7.2,
"elements": [
{ "type": "video", "source": "bg.mp4" },
{ "type": "text", "text": "Built for the AI era." },
{ "type": "caption", "words": [
{ "text": "Built", "start": 2.16 },
{ "text": "for", "start": 2.88 },
{ "text": "the", "start": 3.46 },
{ "text": "AI", "start": 4.03 },
{ "text": "era.", "start": 4.90 },
] }
]
}
preview.mp4
0s1s2s3s4s5s6s7s
video
bg.mp4
text
title
logo
captions
"Built"
"for"
"the"
"AI"
"era."
positioning

Built for a different problem than Remotion.

Remotion is React for video. You write components, manage frame math, deploy headless Chromium, and end up with a beautiful pipeline — if you're a React developer authoring every video yourself.

Clipkit is JSON for video. You describe a timeline. The runtime renders it on the GPU. There's nothing to bundle, no Chromium to deploy, and no useCurrentFrame to debug.

If your video is being authored by a person typing React, Remotion is great. If your video is being authored by an agent, an end user in your app, or a JSON template — that's what Clipkit is for.

three principles

One schema. One runtime. Three reasons it works.

01

AI-native by design

LLMs reliably emit structured data. They unreliably emit code. Clipkit's schema is a constrained, well-typed JSON timeline — exactly what models are best at producing.

No frame math · no sequence wrapping · no compile step
02

GPU-rendered, real-time

Clipkit composites in WebGL/WebGPU. Preview is instant scrubbing — not "wait while Chromium screenshots 1,800 frames." The same engine powers preview and final render.

WebGL/WebGPU · 60fps preview · same engine for render
03

Framework-agnostic

JSON works everywhere. React, Vue, Svelte, Angular, vanilla JS, Python backends, mobile, your CI. There is no SDK lock-in to a particular frontend framework.

React · Vue · Svelte · Angular · Python · cURL
how it works

Describe the video.
Render the video.

A Clipkit project is a JSON document that conforms to the open Clipkit Protocol. Elements live on tracks, positioned in space and time. Animations, captions, transitions, and effects are first-class.

No build step. No Lambda. No Chromium. Render in the browser with the SDK, or POST the JSON to our hosted render API and get back an MP4.

project.json
{
  "output_format": "mp4",
  "width": 1080,
  "height": 1920,
  "duration": 12,
  "elements": [
    {
      "id": "bg",
      "type": "video",
      "source": "https://cdn.example.com/clip.mp4",
      "time": 0,
      "duration": 12,
      "fit": "cover"
    },
    {
      "id": "title",
      "type": "text",
      "text": "Built for the AI era.",
      "font_family": "Inter",
      "font_weight": 700,
      "x": "50%",
      "y": "20%",
      "animations": [{ "type": "slide-up-in", "duration": 0.6 }]
    },
    {
      "id": "captions",
      "type": "caption",
      "style": "tiktok_bounce",
      "words": [
        { "text": "Built", "start": 0.2, "end": 0.6 },
        { "text": "for",   "start": 0.6, "end": 0.85 },
        { "text": "the",   "start": 0.85, "end": 1.1 },
        { "text": "AI",    "start": 1.1, "end": 1.4 },
        { "text": "era.",  "start": 1.4, "end": 1.9 }
      ]
    }
  ]
}
mcp · agents

Plug Clipkit into Claude, Cursor & any MCP agent.

Clipkit ships an official MCP server. Once installed, your agent gets a video toolbox — five primitives, schema-typed, agents speak it natively.

create_projectadd_elementedit_elementtranscribe_to_captionsrender_video
See the MCP docs
claude_desktop_config.json
-cmt"># install once
$ -bool">npx -y @clipkit/mcp-server

-cmt"># in your MCP host config:
{
  "mcpServers": {
    "clipkit": {
      "command": "-bool">npx",
      "args": ["-y", "@clipkit/mcp-server"]
    }
  }
}

-cmt"># Claude can now produce videos
-cmt"># by describing them.
features

What's in the box.

Word-level animated captions

Caption elements take an array of timestamped words and render them with kinetic styles — TikTok bounce, fade reveal, kinetic typewriter, and more. Hand Whisper output straight to the schema.

tiktok_bounce

Animation presets

A curated library of named motions — fade-in, slide-up-in, scale-in, bounce-in, rotate-in. Each takes a duration, easing, and delay. AI picks the preset; humans tune the knobs.

slide-up-in · 0.6s

Stock media built in

Bring API keys for Shutterstock, Unsplash, and Pexels. Reference media by query, not URL. The runtime resolves and caches.

unsplash · shutterstock · pexels

Open-source runtime

The full rendering engine is open source under Apache-2.0. Powered by the open Clipkit Protocol (CKP/1.0) — self-host, audit, contribute. The format your videos are written in is documented and versioned, not owned.

apache 2.0 · github.com/clipkit

Pay only for hosted renders

Use the open-source runtime for free. Pay per second of output for hosted rendering. No editor session fees, no monthly minimums.

$/sec · no minimums
embeddable editor

Need a video editor inside your app?
Add one in a line.

editor · embedded
autosaved · 2s ago
MEDIA
INSPECTOR
x50%
y20%
fontInter
weight700
presetslide-up-in
TIMELINE
App.jsx
import Clipkit from '@clipkit/sdk'

const editor = new Clipkit({ apiKey: 'YOUR_API_KEY' })
editor.init('#editor-container')
editor.loadProject(project)
editor.on('change', json => save(json))

The same JSON that AI authors is what powers our embeddable editor.
One schema. Two surfaces. AI drives. Humans tune. Both write to the same file.

ReactVueSvelteAngularVanilla JS
honest comparison

When Clipkit fits. When it doesn't.

use clipkit when

Your video isn't being typed by a human, frame-by-frame.

  • You're generating videos from AI output, transcripts, templates, or any structured source.
  • You're embedding a video editor inside your application.
  • Your stack isn't React, or you don't want a React app in the pipeline.
  • You want real-time preview and fast renders without managing Chromium or Lambda.
  • You want an open, portable format that isn't tied to one company's API.

use remotion when

Your video is being typed by a human, frame-by-frame.

  • You're hand-authoring cinematic, bespoke videos in code where every frame is unique.
  • You need the full expressive ceiling of arbitrary React, CSS, SVG, and npm packages.
  • You're a React shop with no need for an end-user editor and no plans to drive renders from AI.
  • You need a behavior our schema doesn't model yet, and you're not willing to extend it.

Both are good tools. They optimize for different authors.

open source

Open runtime. Hosted convenience.

The Clipkit runtime is open source under Apache-2.0. Powered by the open Clipkit Protocol — the format your videos are written in is documented and versioned, not owned. You can self-host renders, fork the engine, and build on it without ever paying us a cent.

The hosted render service is how we keep the lights on. POST a project, get an MP4 URL back, no GPUs to manage. Same model that worked for Vercel + Next.js, Supabase + Postgres, Tailwind + Tailwind UI. The runtime is the standard; the hosting is the product.

THE STACK
Schema (CKP/1.0)
Open protocol. Documented + versioned
free
Runtime
Open source. Self-host or embed
free
SDK
Free up to N renders / month
free
Hosted Render
Pay per second of output
$/sec
faq

Questions, answered straight.

Is Clipkit really open source?
The runtime that renders JSON to video is open source under Apache-2.0. The Clipkit Protocol (CKP/1.0) — the schema your videos are written in — is also Apache-2.0 and versioned. You can self-host the runtime, fork it, and audit it. The hosted render API is a paid managed service on top.
Can I render locally without your servers?
Yes. The runtime renders in the browser via WebGL/WebGPU, and we provide a server-side renderer you can run on your own infrastructure. Hosted is a convenience, not a requirement.
How is this different from Remotion?
Remotion is a React framework for writing videos as code. Clipkit is a runtime for describing videos as JSON. Remotion is for developers authoring videos by hand; Clipkit is for AI agents, embedded editors, and any pipeline that produces structured timelines.
What is the Clipkit Protocol?
It's the documented JSON format the runtime reads. Versioned (CKP/1.0), open under Apache-2.0, fully specified in PROTOCOL.md. It's how the runtime, MCP server, editor, and render service stay aligned on one contract — and how you can trust the format your videos live in.
Does the AI authoring actually work?
Yes. The schema is small enough for current models to emit correctly. We ship an MCP server so Claude, Cursor, and any MCP-compatible agent can author videos directly. Drop transcripts in, get rendered videos out.
What can I render?
Video, image, text, shapes, audio, animated captions, animation presets, gradients, particles, SVG with stroke evolution — composed on any number of tracks, with effects and transitions. The full schema is documented in CKP/1.0.
What about complex one-off videos?
Honest answer: if your video needs arbitrary React, CSS animations, or a specific npm package's rendering, Remotion is a better fit. Clipkit's expressive ceiling is bounded by its schema. The schema covers the vast majority of programmatic video — but not all of it.
What does it cost?
The runtime: free, forever. Hosted rendering: per-second of output, see pricing page. The embeddable editor SDK: free up to a generous tier per month, paid plans above that.

Ship video at the
speed of JSON.

Open source runtime. Hosted rendering when you want it. An MCP server for your agents. A schema that's actually nice to write.

Get started — freeRead the docs★ Star on GitHub