VO to Images API
REST API reference for the VO to Images pipeline. For the product overview and UI walkthrough, see VO to Images.
VO to Images is available via the Flokan public API. Because the pipeline is multi-step and long-running, the API is split into a small set of endpoints you call in sequence — there is no single "do it all" call that returns finished images.
Authentication
All requests require a workspace API key minted in Workspace Settings → API Keys. Pass the key in the Authorization header:
Authorization: Bearer flk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxThe key must include the vo_to_images scope, and the user who created the key must have the Access VO to Images permission in the workspace.
Base URL
https://app.flokan.com/api/v1/vo-to-imagesEnd-to-end flow
The shortest path from "I have an audio URL" to "I have generated images" is:
POST /presets(optional) — read your saved settings presets and pick one.POST /projects— create a project (optionally with apreset_id).POST /projects/{projectId}/audio— point the project at an audio URL the server can fetch.POST /projects/{projectId}/auto-run— kicks off transcription + segmentation. Returns a list ofsegment_ids.- For each
segment_id,POST /projects/{projectId}/segments/{segmentId}/prompts— generates prompts for that segment. Safe to fan out in parallel (10 at a time is a good ceiling). POST /projects/{projectId}/finalize— once all segments have prompts, this enqueues image generation jobs.GET /projects/{projectId}/auto-status— poll every 5–10 seconds untilis_done: true.GET /projects/{projectId}— read the finished sentences with theirimage_urls.
If generate_videos: true is set on the project, video jobs are enqueued automatically by the same worker that processes images — no separate finalize call is needed.
Endpoints
POST /projects — Create a project
Request body
| Field | Type | Required | Notes |
|---|---|---|---|
title | string | yes | 1–200 characters |
workflow_mode | string | no | automatic (default) or manual. Auto-run only works on automatic projects. |
preset_id | uuid | no | Apply a saved settings preset at creation time |
art_style_id | uuid | no | Inline override (wins over preset) |
char_gen_style_id | uuid | no | Inline override |
detect_characters | boolean | no | When false, auto-run skips the character-detect step |
generate_characters | boolean | no | When true, character portraits are auto-generated |
generate_videos | boolean | no | When true, video jobs are enqueued after images complete |
GET /projects — List projects
Query params
| Param | Type | Notes |
|---|---|---|
page | integer | Default 1 |
per_page | integer | Default 50, max 100 |
status | string | draft, transcribing, characters, prompts, generating, complete |
GET /projects/{projectId} — Fetch project with sentences/images/videos
Query params
| Param | Type | Notes |
|---|---|---|
sentences_offset | integer | Default 0 |
sentences_limit | integer | Default 150, max 300 |
Each sentence includes its images[] (with public image_url for completed images) and video_jobs[]. Use has_more_sentences + sentences_offset to page through long projects.
DELETE /projects/{projectId} — Delete a project
Deletes the project, all child rows, and the underlying audio / image / video files in storage.
POST /projects/{projectId}/audio — Set audio from URL
The server fetches the URL and uploads the file to your workspace's voiceover bucket. Hard cap of 200 MB. For larger files, host them somewhere reachable and pass the public URL.
Request body
| Field | Type | Required | Notes |
|---|---|---|---|
audio_url | string | yes | HTTP(S) URL the Flokan server can fetch (max 2000 chars) |
Supported formats: MP3, WAV, M4A, AAC, OGG, FLAC, WebM.
POST /projects/{projectId}/auto-run — Run prep pipeline
Transcribes the audio, optionally detects characters, recomputes initial segments, and flips auto_step to generating_prompts.
Returns 202 Accepted. The response includes segment_ids[] — call the prompts endpoint for each segment next.
Response 202
{
"success": true,
"data": {
"project_id": "...",
"auto_step": "generating_prompts",
"sentence_count": 87,
"duration_ms": 312450,
"transcript": "Welcome to the channel...",
"characters": [],
"skipped_character_detection": true,
"segment_ids": ["seg-1-uuid", "seg-2-uuid", "..."],
"next_actions": [
"For each segment_id, call POST /api/v1/vo-to-images/projects/{projectId}/segments/{segmentId}/prompts to generate prompts (parallelizable, up to 10 at a time).",
"Once all segments have prompts, call POST /api/v1/vo-to-images/projects/{projectId}/finalize to enqueue image generation jobs.",
"Poll GET /api/v1/vo-to-images/projects/{projectId}/auto-status to track progress."
]
}
}POST /projects/{projectId}/segments/{segmentId}/prompts — Generate prompts for one segment
Safe to call in parallel for many segments. Each call has its own 300-second budget so long projects don't hit a shared timeout.
Request body (all fields optional)
| Field | Type | Notes |
|---|---|---|
smart_camera_angles | boolean | Enables AI-chosen camera directions on prompts |
POST /projects/{projectId}/finalize — Enqueue image generation
Call once every segment has prompts. Refuses if no sentences are in prompt_ready or failed status.
GET /projects/{projectId}/auto-status — Poll progress
Cheap status endpoint — returns just the project's automatic-mode state and counts.
Response 200
{
"success": true,
"data": {
"project_id": "...",
"status": "generating",
"auto_step": "generating_images",
"workflow_mode": "automatic",
"is_done": false,
"progress": {
"segments_total": 12,
"sentences_total": 87,
"prompts_ready": 87,
"sentences_complete": 41,
"sentences_failed": 0,
"image_jobs_active": 16,
"image_jobs_failed": 0,
"video_jobs_total": 0,
"video_jobs_complete": 0,
"video_jobs_failed": 0
}
}
}Recommended poll cadence: 5–10 seconds. is_done flips to true once every sentence is in a terminal state (and, when generate_videos is on, every video job too).
GET /estimate-credits — Estimate cost before running a phase
Query params
| Param | Type | Required | Notes |
|---|---|---|---|
project_id | uuid | yes | |
action | string | yes | generate_images, generate_prompts, create_characters, or detect_characters |
Response 200
{
"success": true,
"data": {
"action": "generate_images",
"count": 87,
"estimated_credits": 4263.0,
"per_item_credits": 49.0,
"is_exact": false,
"available_credits": 12500.0,
"can_proceed": true
}
}GET /presets — List settings presets
Returns every saved preset in the workspace. Pass any preset's id to POST /projects (preset_id) at creation, or to POST /projects/{projectId}/apply-preset to load it onto an existing project.
POST /projects/{projectId}/apply-preset — Apply a preset to an existing project
Request body
| Field | Type | Required | Notes |
|---|---|---|---|
preset_id | uuid | yes | Must belong to the same workspace |
workflow_mode is intentionally not merged — changing it mid-project would invalidate pipeline state.
Errors
All error responses use the envelope { "success": false, "error": "..." }.
| Code | Meaning |
|---|---|
400 | Validation error, missing prerequisite (no audio, no art style, wrong auto_step, …) |
401 | Missing or invalid API key |
402 | Insufficient AI credits or storage limit exceeded |
403 | Missing scope, missing permission, or feature not enabled on the workspace plan |
404 | Project, segment, or preset not found in this workspace |
413 | Audio file exceeds the 200 MB upload cap |
429 | Rate limit exceeded — see Retry-After header and X-RateLimit-* headers |
500 | Unexpected server or provider error |
Rate limits
Every response carries X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers. The per-minute limit is determined by the workspace's billing plan.
Worked example
# 1. Create the project, applying a saved preset
PROJECT=$(curl -s https://app.flokan.com/api/v1/vo-to-images/projects \
-H "Authorization: Bearer $FLOKAN_KEY" \
-H "Content-Type: application/json" \
-d '{"title":"My Narrated Video","preset_id":"PRESET_UUID"}' \
| jq -r '.data.id')
# 2. Attach audio
curl -s -X POST https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT/audio \
-H "Authorization: Bearer $FLOKAN_KEY" \
-H "Content-Type: application/json" \
-d '{"audio_url":"https://example.com/voiceover.mp3"}'
# 3. Run the prep phase — returns segment_ids
SEGMENTS=$(curl -s -X POST https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT/auto-run \
-H "Authorization: Bearer $FLOKAN_KEY" \
| jq -r '.data.segment_ids[]')
# 4. Generate prompts for each segment (run these in parallel for speed)
for SEG in $SEGMENTS; do
curl -s -X POST "https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT/segments/$SEG/prompts" \
-H "Authorization: Bearer $FLOKAN_KEY" &
done
wait
# 5. Enqueue image generation
curl -s -X POST https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT/finalize \
-H "Authorization: Bearer $FLOKAN_KEY"
# 6. Poll until done
while :; do
DONE=$(curl -s "https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT/auto-status" \
-H "Authorization: Bearer $FLOKAN_KEY" | jq -r '.data.is_done')
[ "$DONE" = "true" ] && break
sleep 8
done
# 7. Read the finished images
curl -s "https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT" \
-H "Authorization: Bearer $FLOKAN_KEY" \
| jq '.data.sentences[] | {sequence, image_urls: [.images[].image_url]}'