AI multimodal input composer

ai-ui specs/ai-ui/multimodal-input.kmd

Unified composer for text + image + file + voice input. Drag-and-drop + paste hook + attach button + mic button. Upload preview chips with MIME validation gated by model capabilities. Voice mode transition to voice-mode.kmd (#121). Required for any AI chat surface.

When this spec applies

Primary triggers

Render input area for AI chat

All triggers

Build composer for AI chat surface
Implement multi-input (image + file + voice + text) UX
Audit input UX for any AI product

Spec — AI multimodal input composer

Voice integration via voice-mode.kmd (#121) e voice/wake-word.kmd. Capability gating via model-selector.kmd (#113). Impl ticket voice-side: services/ai/ai#115 (já aberto).

Princípios

One composer, four modes — text, image, file, voice. No produto separated UI per mode.
Drag-and-drop + paste + attach — todo o composer aceita drops; paste from clipboard auto-attaches.
MIME validation gated by model — accept only what active model can consume.
Voice as transition — mic button enters dedicated voice mode (#121), não inline transcription.
Storage scoped — uploads em path-prefix per workspace.

R1 — Anatomia

┌─────────────────────────────────────────────────┐
│ [📷 photo.jpg · 240KB · ✗]                      │  ← upload preview chips
│ [📄 spec.pdf · 1.2MB · ✗]                       │
├─────────────────────────────────────────────────┤
│ [🖼] [📎] [text input area...........] [🎙] [➤]│  ← composer row
└─────────────────────────────────────────────────┘

Slots:

Slot	Function
Preview chips	One per attached file; remove (✗) per chip
Image button 🖼	Open image picker (camera + gallery)
Attach button 📎	Open file picker (per model MIME whitelist)
Text input	Multi-line; auto-grow up to N lines; max-chars per model
Mic button 🎙	Enter voice mode (cross-link #121)
Send ➤	Submit; disabled if empty + no attachments

Drop zone: ENTIRE composer area accepts drops (visual highlight on dragover).

R2 — Upload preview chips

Per file attached:

┌────────────────────────────────────┐
│ [thumb] filename · MIME · size  ✗ │
└────────────────────────────────────┘

Thumb:

Image: actual thumbnail (96×96 max).
PDF: first page render (futuro; v1 = generic icon).
Audio: waveform mini.
Other: MIME icon.

Validation states:

State	Visual
Valid (model accepts)	Default chip
Invalid (model rejects MIME)	Red border + tooltip "Model X doesn't support {MIME}"
Uploading	Progress bar overlay
Failed	Error icon + Retry button

R3 — MIME validation per model

Current model capability (from model-selector.kmd #113):

Model capability	MIME accepted
Vision	image/png, image/jpeg, image/webp, image/gif
Documents	application/pdf, text/markdown, text/plain
Audio	audio/mp3, audio/wav, audio/ogg, audio/m4a
Video	video/mp4, video/webm
Files (generic text)	any text/* up to N MB

Attach button filters file picker by accepted MIMEs.

Drag-drop: invalid MIME shows red overlay + reject sound.

R4 — Drag-and-drop

dragover entire composer: visual highlight (border accent + tint).
drop: validates each file MIME; valid → adds to chips; invalid → rejection toast.
dragend without drop: removes highlight.

Mobile equivalent: long-press file in OS file manager → "Share to" → Koder app target.

R5 — Paste hook

Image in clipboard: auto-attach as pasted-{timestamp}.png.
URL in clipboard: if model supports web-fetch, offer "Attach URL preview" chip (chip = fetched + summarized).
Text: standard paste in text input (default behavior).

R6 — Voice button → voice mode

Tap mic button:

Check voice.enabled toggle from voice/wake-word.kmd R1 — if off, prompt user to enable in settings.
Request mic permission (if not granted).
Transition to fullscreen voice mode per voice-mode.kmd (#121).
Voice mode user closes → back to composer; transcription (if transcript_to_text enabled) populates text input.

NÃO renderiza waveform inline no composer; voice mode é separated UI.

R7 — Storage path-prefix per workspace

Per policies/multi-tenant-by-default.kmd:

Object store path: koder://workspaces/<workspace_id>/users/<koder_user_id>/uploads/<conversation_id>/<file_uuid>.{ext}
Cross-tenant access returns 404.
Retention per identity-data-retention.kmd: orphaned uploads (conversation deleted) cascade after 24h grace.

R8 — Limits

Limit	Default
Max files per message	10
Max size per file	25 MB
Max total size per message	100 MB
Max text length	32 K chars (limited by model context further)

Limits configurable per workspace/product.

R9 — Surface bindings

Surface	API
Flutter	`KoderComposer({onSubmit, model, attachments, onAttach, onMic})` em `koder_kit/lib/src/ai/composer.dart`
Web	`<koder-composer model-id="...">`
Compose Android	`KoderComposer` (futuro)
SwiftUI iOS	idem (futuro)
CLI / TUI	Multi-line stdin + `koder attach <path>` slash command

R10 — Acessibilidade

Composer: role="form" aria-label="Message composer".
Buttons: aria-label per button.
Drop zone: announce drop state via aria-live.
Chips: <li> semantically; remove button per chip.
Keyboard: Tab cycle; Cmd/Ctrl+Enter to submit.
Reduced-motion: no slide animations.

R11 — i18n

Key	en-US	pt-BR
`ai.composer.placeholder`	"Type a message..."	"Digite uma mensagem..."
`ai.composer.attach.image`	"Attach image"	"Anexar imagem"
`ai.composer.attach.file`	"Attach file"	"Anexar arquivo"
`ai.composer.mic`	"Voice input"	"Entrada de voz"
`ai.composer.send`	"Send"	"Enviar"
`ai.composer.mime_rejected`	"{model} doesn't support {mime}"	"{model} não suporta {mime}"
`ai.composer.limit.files`	"Max {n} files per message"	"Máx. {n} arquivos por mensagem"
`ai.composer.limit.size`	"File exceeds {size}"	"Arquivo excede {size}"
`ai.composer.drop.hint`	"Drop files here"	"Solte os arquivos aqui"

R12 — Per-preset variation

Cosmetic. Composer is always functional.

T-suite

T1 Mount: composer renders all 4 input modes.
T2 Text submit: type + Enter (or Send) → onSubmit callback with text only.
T3 Attach image: file picker → image added to chips → submit with attachment.
T4 Drag-and-drop: simulate dragover + drop file → chip added.
T5 Paste image: simulate clipboard image paste → chip auto-attached.
T6 MIME rejection: drop non-supported MIME → red chip + rejection toast.
T7 Mic transitions: tap mic → voice-mode.kmd UI invoked.
T8 Limits enforced: try to attach 11th file → blocked; try to attach > 25MB file → blocked.
T9 Multi-tenant: upload from workspace A → path-prefix correct; workspace B can't access.
T10 Keyboard: Cmd+Enter submits; arrow keys nav between chips.
T11 A11y: drop announcement via aria-live; buttons aria-labeled.
N1 Mic without permission: graceful prompt + degrade (no voice option).

Cross-link

Companion: voice-mode.kmd (#121), model-selector.kmd (#113 — MIME gating), chat-message-bubble.kmd (submit destination)
Voice base: voice/wake-word.kmd
Storage: policies/multi-tenant-by-default.kmd
Impl voice ticket: services/ai/ai/backlog/pending/115-cli-desktop-voice-input.md
Refs: shadcn AI Input Group, LangChain agent-chat-ui multimodal

References

meta/docs/stack/specs/ai-ui/voice-mode.kmd
meta/docs/stack/specs/ai-ui/model-selector.kmd
meta/docs/stack/specs/ai-ui/chat-message-bubble.kmd
meta/docs/stack/specs/voice/wake-word.kmd
meta/docs/stack/policies/multi-tenant-by-default.kmd