AI multimodal input composer
ai-ui specs/ai-ui/multimodal-input.kmd
Unified composer for text + image + file + voice input. Drag-and-drop + paste hook + attach button + mic button. Upload preview chips with MIME validation gated by model capabilities. Voice mode transition to voice-mode.kmd (#121). Required for any AI chat surface.
When this spec applies
Primary triggers
- Render input area for AI chat
All triggers
- Build composer for AI chat surface
- Implement multi-input (image + file + voice + text) UX
- Audit input UX for any AI product
Specification body
Spec — AI multimodal input composer
Voice integration via
voice-mode.kmd(#121) evoice/wake-word.kmd. Capability gating viamodel-selector.kmd(#113). Impl ticket voice-side:services/ai/ai#115(já aberto).
Princípios
- One composer, four modes — text, image, file, voice. No produto separated UI per mode.
- Drag-and-drop + paste + attach — todo o composer aceita drops; paste from clipboard auto-attaches.
- MIME validation gated by model — accept only what active model can consume.
- Voice as transition — mic button enters dedicated voice mode (#121), não inline transcription.
- Storage scoped — uploads em path-prefix per workspace.
R1 — Anatomia
┌─────────────────────────────────────────────────┐
│ [📷 photo.jpg · 240KB · ✗] │ ← upload preview chips
│ [📄 spec.pdf · 1.2MB · ✗] │
├─────────────────────────────────────────────────┤
│ [🖼] [📎] [text input area...........] [🎙] [➤]│ ← composer row
└─────────────────────────────────────────────────┘
Slots:
| Slot | Function |
|---|---|
| Preview chips | One per attached file; remove (✗) per chip |
| Image button 🖼 | Open image picker (camera + gallery) |
| Attach button 📎 | Open file picker (per model MIME whitelist) |
| Text input | Multi-line; auto-grow up to N lines; max-chars per model |
| Mic button 🎙 | Enter voice mode (cross-link #121) |
| Send ➤ | Submit; disabled if empty + no attachments |
Drop zone: ENTIRE composer area accepts drops (visual highlight on dragover).
R2 — Upload preview chips
Per file attached:
┌────────────────────────────────────┐
│ [thumb] filename · MIME · size ✗ │
└────────────────────────────────────┘
Thumb:
- Image: actual thumbnail (96×96 max).
- PDF: first page render (futuro; v1 = generic icon).
- Audio: waveform mini.
- Other: MIME icon.
Validation states:
| State | Visual |
|---|---|
| Valid (model accepts) | Default chip |
| Invalid (model rejects MIME) | Red border + tooltip "Model X doesn't support {MIME}" |
| Uploading | Progress bar overlay |
| Failed | Error icon + Retry button |
R3 — MIME validation per model
Current model capability (from model-selector.kmd #113):
| Model capability | MIME accepted |
|---|---|
| Vision | image/png, image/jpeg, image/webp, image/gif |
| Documents | application/pdf, text/markdown, text/plain |
| Audio | audio/mp3, audio/wav, audio/ogg, audio/m4a |
| Video | video/mp4, video/webm |
| Files (generic text) | any text/* up to N MB |
Attach button filters file picker by accepted MIMEs.
Drag-drop: invalid MIME shows red overlay + reject sound.
R4 — Drag-and-drop
- dragover entire composer: visual highlight (border accent + tint).
- drop: validates each file MIME; valid → adds to chips; invalid → rejection toast.
- dragend without drop: removes highlight.
Mobile equivalent: long-press file in OS file manager → "Share to" → Koder app target.
R5 — Paste hook
- Image in clipboard: auto-attach as
pasted-{timestamp}.png. - URL in clipboard: if model supports web-fetch, offer "Attach URL preview" chip (chip = fetched + summarized).
- Text: standard paste in text input (default behavior).
R6 — Voice button → voice mode
Tap mic button:
- Check
voice.enabledtoggle fromvoice/wake-word.kmdR1 — if off, prompt user to enable in settings. - Request mic permission (if not granted).
- Transition to fullscreen voice mode per
voice-mode.kmd(#121). - Voice mode user closes → back to composer; transcription (if
transcript_to_textenabled) populates text input.
NÃO renderiza waveform inline no composer; voice mode é separated UI.
R7 — Storage path-prefix per workspace
Per policies/multi-tenant-by-default.kmd:
- Object store path:
koder://workspaces/<workspace_id>/users/<koder_user_id>/uploads/<conversation_id>/<file_uuid>.{ext} - Cross-tenant access returns 404.
- Retention per
identity-data-retention.kmd: orphaned uploads (conversation deleted) cascade after 24h grace.
R8 — Limits
| Limit | Default |
|---|---|
| Max files per message | 10 |
| Max size per file | 25 MB |
| Max total size per message | 100 MB |
| Max text length | 32 K chars (limited by model context further) |
Limits configurable per workspace/product.
R9 — Surface bindings
| Surface | API |
|---|---|
| Flutter | KoderComposer({onSubmit, model, attachments, onAttach, onMic}) em koder_kit/lib/src/ai/composer.dart |
| Web | <koder-composer model-id="..."> |
| Compose Android | KoderComposer (futuro) |
| SwiftUI iOS | idem (futuro) |
| CLI / TUI | Multi-line stdin + koder attach <path> slash command |
R10 — Acessibilidade
- Composer:
role="form" aria-label="Message composer". - Buttons:
aria-labelper button. - Drop zone: announce drop state via
aria-live. - Chips:
<li>semantically; remove button per chip. - Keyboard: Tab cycle; Cmd/Ctrl+Enter to submit.
- Reduced-motion: no slide animations.
R11 — i18n
| Key | en-US | pt-BR |
|---|---|---|
ai.composer.placeholder | "Type a message..." | "Digite uma mensagem..." |
ai.composer.attach.image | "Attach image" | "Anexar imagem" |
ai.composer.attach.file | "Attach file" | "Anexar arquivo" |
ai.composer.mic | "Voice input" | "Entrada de voz" |
ai.composer.send | "Send" | "Enviar" |
ai.composer.mime_rejected | "{model} doesn't support {mime}" | "{model} não suporta {mime}" |
ai.composer.limit.files | "Max {n} files per message" | "Máx. {n} arquivos por mensagem" |
ai.composer.limit.size | "File exceeds {size}" | "Arquivo excede {size}" |
ai.composer.drop.hint | "Drop files here" | "Solte os arquivos aqui" |
R12 — Per-preset variation
Cosmetic. Composer is always functional.
T-suite
- T1 Mount: composer renders all 4 input modes.
- T2 Text submit: type + Enter (or Send) → onSubmit callback with text only.
- T3 Attach image: file picker → image added to chips → submit with attachment.
- T4 Drag-and-drop: simulate dragover + drop file → chip added.
- T5 Paste image: simulate clipboard image paste → chip auto-attached.
- T6 MIME rejection: drop non-supported MIME → red chip + rejection toast.
- T7 Mic transitions: tap mic → voice-mode.kmd UI invoked.
- T8 Limits enforced: try to attach 11th file → blocked; try to attach > 25MB file → blocked.
- T9 Multi-tenant: upload from workspace A → path-prefix correct; workspace B can't access.
- T10 Keyboard: Cmd+Enter submits; arrow keys nav between chips.
- T11 A11y: drop announcement via aria-live; buttons aria-labeled.
- N1 Mic without permission: graceful prompt + degrade (no voice option).
Cross-link
- Companion:
voice-mode.kmd(#121),model-selector.kmd(#113 — MIME gating),chat-message-bubble.kmd(submit destination) - Voice base:
voice/wake-word.kmd - Storage:
policies/multi-tenant-by-default.kmd - Impl voice ticket:
services/ai/ai/backlog/pending/115-cli-desktop-voice-input.md - Refs: shadcn AI Input Group, LangChain agent-chat-ui multimodal
References
meta/docs/stack/specs/ai-ui/voice-mode.kmdmeta/docs/stack/specs/ai-ui/model-selector.kmdmeta/docs/stack/specs/ai-ui/chat-message-bubble.kmdmeta/docs/stack/specs/voice/wake-word.kmdmeta/docs/stack/policies/multi-tenant-by-default.kmd