Documentation
Public API
VO to Images API

VO to Images API

REST API reference for the VO to Images pipeline. For the product overview and UI walkthrough, see VO to Images.

VO to Images is available via the Flokan public API. Because the pipeline is multi-step and long-running, the API is split into a small set of endpoints you call in sequence — there is no single "do it all" call that returns finished images.

Authentication

All requests require a workspace API key minted in Workspace Settings → API Keys. Pass the key in the Authorization header:

Authorization: Bearer flk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The key must include the vo_to_images scope, and the user who created the key must have the Access VO to Images permission in the workspace.

Base URL

https://app.flokan.com/api/v1/vo-to-images

End-to-end flow

The shortest path from "I have an audio URL" to "I have generated images" is:

  1. POST /presets (optional) — read your saved settings presets and pick one.
  2. POST /projects — create a project (optionally with a preset_id).
  3. POST /projects/{projectId}/audio — point the project at an audio URL the server can fetch.
  4. POST /projects/{projectId}/auto-run — kicks off transcription + segmentation. Returns a list of segment_ids.
  5. For each segment_id, POST /projects/{projectId}/segments/{segmentId}/prompts — generates prompts for that segment. Safe to fan out in parallel (10 at a time is a good ceiling).
  6. POST /projects/{projectId}/finalize — once all segments have prompts, this enqueues image generation jobs.
  7. GET /projects/{projectId}/auto-status — poll every 5–10 seconds until is_done: true.
  8. GET /projects/{projectId} — read the finished sentences with their image_urls.

If generate_videos: true is set on the project, video jobs are enqueued automatically by the same worker that processes images — no separate finalize call is needed.

Endpoints

POST /projects — Create a project

Request body

FieldTypeRequiredNotes
titlestringyes1–200 characters
workflow_modestringnoautomatic (default) or manual. Auto-run only works on automatic projects.
preset_iduuidnoApply a saved settings preset at creation time
art_style_iduuidnoInline override (wins over preset)
char_gen_style_iduuidnoInline override
detect_charactersbooleannoWhen false, auto-run skips the character-detect step
generate_charactersbooleannoWhen true, character portraits are auto-generated
generate_videosbooleannoWhen true, video jobs are enqueued after images complete

GET /projects — List projects

Query params

ParamTypeNotes
pageintegerDefault 1
per_pageintegerDefault 50, max 100
statusstringdraft, transcribing, characters, prompts, generating, complete

GET /projects/{projectId} — Fetch project with sentences/images/videos

Query params

ParamTypeNotes
sentences_offsetintegerDefault 0
sentences_limitintegerDefault 150, max 300

Each sentence includes its images[] (with public image_url for completed images) and video_jobs[]. Use has_more_sentences + sentences_offset to page through long projects.

DELETE /projects/{projectId} — Delete a project

Deletes the project, all child rows, and the underlying audio / image / video files in storage.

POST /projects/{projectId}/audio — Set audio from URL

The server fetches the URL and uploads the file to your workspace's voiceover bucket. Hard cap of 200 MB. For larger files, host them somewhere reachable and pass the public URL.

Request body

FieldTypeRequiredNotes
audio_urlstringyesHTTP(S) URL the Flokan server can fetch (max 2000 chars)

Supported formats: MP3, WAV, M4A, AAC, OGG, FLAC, WebM.

POST /projects/{projectId}/auto-run — Run prep pipeline

Transcribes the audio, optionally detects characters, recomputes initial segments, and flips auto_step to generating_prompts.

Returns 202 Accepted. The response includes segment_ids[] — call the prompts endpoint for each segment next.

Response 202

{
  "success": true,
  "data": {
    "project_id": "...",
    "auto_step": "generating_prompts",
    "sentence_count": 87,
    "duration_ms": 312450,
    "transcript": "Welcome to the channel...",
    "characters": [],
    "skipped_character_detection": true,
    "segment_ids": ["seg-1-uuid", "seg-2-uuid", "..."],
    "next_actions": [
      "For each segment_id, call POST /api/v1/vo-to-images/projects/{projectId}/segments/{segmentId}/prompts to generate prompts (parallelizable, up to 10 at a time).",
      "Once all segments have prompts, call POST /api/v1/vo-to-images/projects/{projectId}/finalize to enqueue image generation jobs.",
      "Poll GET /api/v1/vo-to-images/projects/{projectId}/auto-status to track progress."
    ]
  }
}

POST /projects/{projectId}/segments/{segmentId}/prompts — Generate prompts for one segment

Safe to call in parallel for many segments. Each call has its own 300-second budget so long projects don't hit a shared timeout.

Request body (all fields optional)

FieldTypeNotes
smart_camera_anglesbooleanEnables AI-chosen camera directions on prompts

POST /projects/{projectId}/finalize — Enqueue image generation

Call once every segment has prompts. Refuses if no sentences are in prompt_ready or failed status.

GET /projects/{projectId}/auto-status — Poll progress

Cheap status endpoint — returns just the project's automatic-mode state and counts.

Response 200

{
  "success": true,
  "data": {
    "project_id": "...",
    "status": "generating",
    "auto_step": "generating_images",
    "workflow_mode": "automatic",
    "is_done": false,
    "progress": {
      "segments_total": 12,
      "sentences_total": 87,
      "prompts_ready": 87,
      "sentences_complete": 41,
      "sentences_failed": 0,
      "image_jobs_active": 16,
      "image_jobs_failed": 0,
      "video_jobs_total": 0,
      "video_jobs_complete": 0,
      "video_jobs_failed": 0
    }
  }
}

Recommended poll cadence: 5–10 seconds. is_done flips to true once every sentence is in a terminal state (and, when generate_videos is on, every video job too).

GET /estimate-credits — Estimate cost before running a phase

Query params

ParamTypeRequiredNotes
project_iduuidyes
actionstringyesgenerate_images, generate_prompts, create_characters, or detect_characters

Response 200

{
  "success": true,
  "data": {
    "action": "generate_images",
    "count": 87,
    "estimated_credits": 4263.0,
    "per_item_credits": 49.0,
    "is_exact": false,
    "available_credits": 12500.0,
    "can_proceed": true
  }
}

GET /presets — List settings presets

Returns every saved preset in the workspace. Pass any preset's id to POST /projects (preset_id) at creation, or to POST /projects/{projectId}/apply-preset to load it onto an existing project.

POST /projects/{projectId}/apply-preset — Apply a preset to an existing project

Request body

FieldTypeRequiredNotes
preset_iduuidyesMust belong to the same workspace

workflow_mode is intentionally not merged — changing it mid-project would invalidate pipeline state.

Errors

All error responses use the envelope { "success": false, "error": "..." }.

CodeMeaning
400Validation error, missing prerequisite (no audio, no art style, wrong auto_step, …)
401Missing or invalid API key
402Insufficient AI credits or storage limit exceeded
403Missing scope, missing permission, or feature not enabled on the workspace plan
404Project, segment, or preset not found in this workspace
413Audio file exceeds the 200 MB upload cap
429Rate limit exceeded — see Retry-After header and X-RateLimit-* headers
500Unexpected server or provider error

Rate limits

Every response carries X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers. The per-minute limit is determined by the workspace's billing plan.

Worked example

# 1. Create the project, applying a saved preset
PROJECT=$(curl -s https://app.flokan.com/api/v1/vo-to-images/projects \
  -H "Authorization: Bearer $FLOKAN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"title":"My Narrated Video","preset_id":"PRESET_UUID"}' \
  | jq -r '.data.id')
 
# 2. Attach audio
curl -s -X POST https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT/audio \
  -H "Authorization: Bearer $FLOKAN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"audio_url":"https://example.com/voiceover.mp3"}'
 
# 3. Run the prep phase — returns segment_ids
SEGMENTS=$(curl -s -X POST https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT/auto-run \
  -H "Authorization: Bearer $FLOKAN_KEY" \
  | jq -r '.data.segment_ids[]')
 
# 4. Generate prompts for each segment (run these in parallel for speed)
for SEG in $SEGMENTS; do
  curl -s -X POST "https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT/segments/$SEG/prompts" \
    -H "Authorization: Bearer $FLOKAN_KEY" &
done
wait
 
# 5. Enqueue image generation
curl -s -X POST https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT/finalize \
  -H "Authorization: Bearer $FLOKAN_KEY"
 
# 6. Poll until done
while :; do
  DONE=$(curl -s "https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT/auto-status" \
    -H "Authorization: Bearer $FLOKAN_KEY" | jq -r '.data.is_done')
  [ "$DONE" = "true" ] && break
  sleep 8
done
 
# 7. Read the finished images
curl -s "https://app.flokan.com/api/v1/vo-to-images/projects/$PROJECT" \
  -H "Authorization: Bearer $FLOKAN_KEY" \
  | jq '.data.sentences[] | {sequence, image_urls: [.images[].image_url]}'