Skip to content

AI multimodal input composer

ai-ui specs/ai-ui/multimodal-input.kmd

Unified composer for text + image + file + voice input. Drag-and-drop + paste hook + attach button + mic button. Upload preview chips with MIME validation gated by model capabilities. Voice mode transition to voice-mode.kmd (#121). Required for any AI chat surface.

When this spec applies

Primary triggers

All triggers

Specification body

Spec — AI multimodal input composer

Voice integration via voice-mode.kmd (#121) e voice/wake-word.kmd. Capability gating via model-selector.kmd (#113). Impl ticket voice-side: services/ai/ai#115 (já aberto).

Princípios

  1. One composer, four modes — text, image, file, voice. No produto separated UI per mode.
  2. Drag-and-drop + paste + attach — todo o composer aceita drops; paste from clipboard auto-attaches.
  3. MIME validation gated by model — accept only what active model can consume.
  4. Voice as transition — mic button enters dedicated voice mode (#121), não inline transcription.
  5. Storage scoped — uploads em path-prefix per workspace.

R1 — Anatomia

┌─────────────────────────────────────────────────┐
│ [📷 photo.jpg · 240KB · ✗]                      │  ← upload preview chips
│ [📄 spec.pdf · 1.2MB · ✗]                       │
├─────────────────────────────────────────────────┤
│ [🖼] [📎] [text input area...........] [🎙] [➤]│  ← composer row
└─────────────────────────────────────────────────┘

Slots:

SlotFunction
Preview chipsOne per attached file; remove (✗) per chip
Image button 🖼Open image picker (camera + gallery)
Attach button 📎Open file picker (per model MIME whitelist)
Text inputMulti-line; auto-grow up to N lines; max-chars per model
Mic button 🎙Enter voice mode (cross-link #121)
Send ➤Submit; disabled if empty + no attachments

Drop zone: ENTIRE composer area accepts drops (visual highlight on dragover).

R2 — Upload preview chips

Per file attached:

┌────────────────────────────────────┐
│ [thumb] filename · MIME · size  ✗ │
└────────────────────────────────────┘

Thumb:

  • Image: actual thumbnail (96×96 max).
  • PDF: first page render (futuro; v1 = generic icon).
  • Audio: waveform mini.
  • Other: MIME icon.

Validation states:

StateVisual
Valid (model accepts)Default chip
Invalid (model rejects MIME)Red border + tooltip "Model X doesn't support {MIME}"
UploadingProgress bar overlay
FailedError icon + Retry button

R3 — MIME validation per model

Current model capability (from model-selector.kmd #113):

Model capabilityMIME accepted
Visionimage/png, image/jpeg, image/webp, image/gif
Documentsapplication/pdf, text/markdown, text/plain
Audioaudio/mp3, audio/wav, audio/ogg, audio/m4a
Videovideo/mp4, video/webm
Files (generic text)any text/* up to N MB

Attach button filters file picker by accepted MIMEs.

Drag-drop: invalid MIME shows red overlay + reject sound.

R4 — Drag-and-drop

  • dragover entire composer: visual highlight (border accent + tint).
  • drop: validates each file MIME; valid → adds to chips; invalid → rejection toast.
  • dragend without drop: removes highlight.

Mobile equivalent: long-press file in OS file manager → "Share to" → Koder app target.

R5 — Paste hook

  • Image in clipboard: auto-attach as pasted-{timestamp}.png.
  • URL in clipboard: if model supports web-fetch, offer "Attach URL preview" chip (chip = fetched + summarized).
  • Text: standard paste in text input (default behavior).

R6 — Voice button → voice mode

Tap mic button:

  1. Check voice.enabled toggle from voice/wake-word.kmd R1 — if off, prompt user to enable in settings.
  2. Request mic permission (if not granted).
  3. Transition to fullscreen voice mode per voice-mode.kmd (#121).
  4. Voice mode user closes → back to composer; transcription (if transcript_to_text enabled) populates text input.

NÃO renderiza waveform inline no composer; voice mode é separated UI.

R7 — Storage path-prefix per workspace

Per policies/multi-tenant-by-default.kmd:

  • Object store path: koder://workspaces/<workspace_id>/users/<koder_user_id>/uploads/<conversation_id>/<file_uuid>.{ext}
  • Cross-tenant access returns 404.
  • Retention per identity-data-retention.kmd: orphaned uploads (conversation deleted) cascade after 24h grace.

R8 — Limits

LimitDefault
Max files per message10
Max size per file25 MB
Max total size per message100 MB
Max text length32 K chars (limited by model context further)

Limits configurable per workspace/product.

R9 — Surface bindings

SurfaceAPI
FlutterKoderComposer({onSubmit, model, attachments, onAttach, onMic}) em koder_kit/lib/src/ai/composer.dart
Web<koder-composer model-id="...">
Compose AndroidKoderComposer (futuro)
SwiftUI iOSidem (futuro)
CLI / TUIMulti-line stdin + koder attach <path> slash command

R10 — Acessibilidade

  • Composer: role="form" aria-label="Message composer".
  • Buttons: aria-label per button.
  • Drop zone: announce drop state via aria-live.
  • Chips: <li> semantically; remove button per chip.
  • Keyboard: Tab cycle; Cmd/Ctrl+Enter to submit.
  • Reduced-motion: no slide animations.

R11 — i18n

Keyen-USpt-BR
ai.composer.placeholder"Type a message...""Digite uma mensagem..."
ai.composer.attach.image"Attach image""Anexar imagem"
ai.composer.attach.file"Attach file""Anexar arquivo"
ai.composer.mic"Voice input""Entrada de voz"
ai.composer.send"Send""Enviar"
ai.composer.mime_rejected"{model} doesn't support {mime}""{model} não suporta {mime}"
ai.composer.limit.files"Max {n} files per message""Máx. {n} arquivos por mensagem"
ai.composer.limit.size"File exceeds {size}""Arquivo excede {size}"
ai.composer.drop.hint"Drop files here""Solte os arquivos aqui"

R12 — Per-preset variation

Cosmetic. Composer is always functional.

T-suite

  • T1 Mount: composer renders all 4 input modes.
  • T2 Text submit: type + Enter (or Send) → onSubmit callback with text only.
  • T3 Attach image: file picker → image added to chips → submit with attachment.
  • T4 Drag-and-drop: simulate dragover + drop file → chip added.
  • T5 Paste image: simulate clipboard image paste → chip auto-attached.
  • T6 MIME rejection: drop non-supported MIME → red chip + rejection toast.
  • T7 Mic transitions: tap mic → voice-mode.kmd UI invoked.
  • T8 Limits enforced: try to attach 11th file → blocked; try to attach > 25MB file → blocked.
  • T9 Multi-tenant: upload from workspace A → path-prefix correct; workspace B can't access.
  • T10 Keyboard: Cmd+Enter submits; arrow keys nav between chips.
  • T11 A11y: drop announcement via aria-live; buttons aria-labeled.
  • N1 Mic without permission: graceful prompt + degrade (no voice option).

References