The video runtime
for agents.
A GPU-rendered video runtime that turns a JSON timeline into a finished MP4. Built for LLMs, agents, and the apps that ship them. Powered by the open Clipkit Protocol.
Built for a different problem than Remotion.
Remotion is React for video. You write components, manage frame math, deploy headless Chromium, and end up with a beautiful pipeline — if you're a React developer authoring every video yourself.
Clipkit is JSON for video. You describe a timeline. The runtime renders it on the GPU. There's nothing to bundle, no Chromium to deploy, and no useCurrentFrame to debug.
If your video is being authored by a person typing React, Remotion is great. If your video is being authored by an agent, an end user in your app, or a JSON template — that's what Clipkit is for.
One schema. One runtime. Three reasons it works.
Describe the video.
Render the video.
A Clipkit project is a JSON document that conforms to the open Clipkit Protocol. Elements live on tracks, positioned in space and time. Animations, captions, transitions, and effects are first-class.
No build step. No Lambda. No Chromium. Render in the browser with the SDK, or POST the JSON to our hosted render API and get back an MP4.
{ "output_format": "mp4", "width": 1080, "height": 1920, "duration": 12, "elements": [ { "id": "bg", "type": "video", "source": "https://cdn.example.com/clip.mp4", "time": 0, "duration": 12, "fit": "cover" }, { "id": "title", "type": "text", "text": "Built for the AI era.", "font_family": "Inter", "font_weight": 700, "x": "50%", "y": "20%", "animations": [{ "type": "slide-up-in", "duration": 0.6 }] }, { "id": "captions", "type": "caption", "style": "tiktok_bounce", "words": [ { "text": "Built", "start": 0.2, "end": 0.6 }, { "text": "for", "start": 0.6, "end": 0.85 }, { "text": "the", "start": 0.85, "end": 1.1 }, { "text": "AI", "start": 1.1, "end": 1.4 }, { "text": "era.", "start": 1.4, "end": 1.9 } ] } ] }
Plug Clipkit into Claude, Cursor & any MCP agent.
Clipkit ships an official MCP server. Once installed, your agent gets a video toolbox — five primitives, schema-typed, agents speak it natively.
-cmt"># install once $ -bool">npx -y @clipkit/mcp-server -cmt"># in your MCP host config: { "mcpServers": { "clipkit": { "command": "-bool">npx", "args": ["-y", "@clipkit/mcp-server"] } } } -cmt"># Claude can now produce videos -cmt"># by describing them.
What's in the box.
Word-level animated captions
Caption elements take an array of timestamped words and render them with kinetic styles — TikTok bounce, fade reveal, kinetic typewriter, and more. Hand Whisper output straight to the schema.
Animation presets
A curated library of named motions — fade-in, slide-up-in, scale-in, bounce-in, rotate-in. Each takes a duration, easing, and delay. AI picks the preset; humans tune the knobs.
Stock media built in
Bring API keys for Shutterstock, Unsplash, and Pexels. Reference media by query, not URL. The runtime resolves and caches.
Open-source runtime
The full rendering engine is open source under Apache-2.0. Powered by the open Clipkit Protocol (CKP/1.0) — self-host, audit, contribute. The format your videos are written in is documented and versioned, not owned.
Pay only for hosted renders
Use the open-source runtime for free. Pay per second of output for hosted rendering. No editor session fees, no monthly minimums.
Need a video editor inside your app?
Add one in a line.
import Clipkit from '@clipkit/sdk' const editor = new Clipkit({ apiKey: 'YOUR_API_KEY' }) editor.init('#editor-container') editor.loadProject(project) editor.on('change', json => save(json))
The same JSON that AI authors is what powers our embeddable editor.
One schema. Two surfaces. AI drives. Humans tune. Both write to the same file.
When Clipkit fits. When it doesn't.
use clipkit when
Your video isn't being typed by a human, frame-by-frame.
- You're generating videos from AI output, transcripts, templates, or any structured source.
- You're embedding a video editor inside your application.
- Your stack isn't React, or you don't want a React app in the pipeline.
- You want real-time preview and fast renders without managing Chromium or Lambda.
- You want an open, portable format that isn't tied to one company's API.
use remotion when
Your video is being typed by a human, frame-by-frame.
- You're hand-authoring cinematic, bespoke videos in code where every frame is unique.
- You need the full expressive ceiling of arbitrary React, CSS, SVG, and npm packages.
- You're a React shop with no need for an end-user editor and no plans to drive renders from AI.
- You need a behavior our schema doesn't model yet, and you're not willing to extend it.
Both are good tools. They optimize for different authors.
Open runtime. Hosted convenience.
The Clipkit runtime is open source under Apache-2.0. Powered by the open Clipkit Protocol — the format your videos are written in is documented and versioned, not owned. You can self-host renders, fork the engine, and build on it without ever paying us a cent.
The hosted render service is how we keep the lights on. POST a project, get an MP4 URL back, no GPUs to manage. Same model that worked for Vercel + Next.js, Supabase + Postgres, Tailwind + Tailwind UI. The runtime is the standard; the hosting is the product.
Questions, answered straight.
Is Clipkit really open source?
Can I render locally without your servers?
How is this different from Remotion?
What is the Clipkit Protocol?
Does the AI authoring actually work?
What can I render?
What about complex one-off videos?
What does it cost?
Ship video at the
speed of JSON.
Open source runtime. Hosted rendering when you want it. An MCP server for your agents. A schema that's actually nice to write.