SSE Streaming Protocol
Overview
The Ask API uses Server-Sent Events (SSE) for streaming responses. This enables real-time delivery of generated tokens and citations as they become available.
Connection Lifecycle
1. Establish Connection
Send a POST request to /api/ask with your question. The server responds with text/event-stream content type and keeps the connection open.
2. Receive Events
The server sends events as they occur:
data: {"type":"token","content":"Hello","message_id":"msg_123"}
data: {"type":"citation","citation":{...}}
data: {"type":"done","message_id":"msg_123",...}3. Connection Termination
The connection closes when:
- The server sends a
doneevent - An error occurs
- The client disconnects
Event Types
token
Delivers a chunk of generated text.
{
"type": "token",
"content": "Hello",
"message_id": "msg_123",
"is_complete": false
}Fields:
content: Text fragment (may be partial word)message_id: Unique message identifieris_complete: Whether this completes the current token
citation
Provides a source reference for information in the response.
{
"type": "citation",
"citation": {
"id": "[1]",
"author": "Jane Smith",
"channel": "#engineering",
"timestamp": "2026-04-13T10:30:00Z",
"text_excerpt": "We decided to use JWT tokens for authentication...",
"permalink": "https://slack.com/archives/C12345/p1234567890123456",
"source_message_id": "1234567890.123456",
"media_type": "link" | "pdf" | "image" | null,
"media_name": "design-doc.pdf" | null
}
}Fields:
id: Citation reference like[1],[2]author: Message authorchannel: Channel nametimestamp: Message timestamptext_excerpt: First 100 characters of sourcepermalink: URL to original messagesource_message_id: Platform message IDmedia_type: Type of media (if any)media_name: Name of media file (if any)
done
Signals completion of the response.
{
"type": "done",
"message_id": "msg_123",
"session_id": "sess_456",
"tokens_used": 1234,
"citations_count": 3,
"duration_ms": 2500
}Fields:
message_id: Final message IDsession_id: Session ID for follow-upstokens_used: Total tokens consumedcitations_count: Number of citationsduration_ms: Response generation time
error
Indicates an error occurred.
{
"type": "error",
"error": "Error description",
"code": "RATE_LIMIT_EXCEEDED" | "INVALID_REQUEST" | "INTERNAL_ERROR"
}Client Implementation
JavaScript (Browser)
async function askQuestion(question) {
const response = await fetch('http://localhost:8000/api/ask', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ question })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
let fullResponse = '';
const citations = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
try {
const event = JSON.parse(line.slice(6));
switch (event.type) {
case 'token':
fullResponse += event.content;
onText(event.content);
break;
case 'citation':
citations.push(event.citation);
onCitation(event.citation);
break;
case 'done':
onComplete({
response: fullResponse,
citations,
metrics: event
});
break;
case 'error':
onError(event.error);
break;
}
} catch (e) {
console.error('Parse error:', e);
}
}
}
}
function onText(text) {
process.stdout.write(text);
}
function onCitation(citation) {
console.log(`\n[${citation.id}] ${citation.author} - ${citation.timestamp}`);
}
function onComplete(result) {
console.log('\n\nComplete:', result.metrics);
}JavaScript (Node.js with EventSource)
const EventSource = require('eventsource');
function askQuestion(question) {
const eventSource = new EventSource(
`http://localhost:8000/api/ask?question=${encodeURIComponent(question)}`
);
eventSource.addEventListener('token', (e) => {
const { content } = JSON.parse(e.data);
process.stdout.write(content);
});
eventSource.addEventListener('citation', (e) => {
const citation = JSON.parse(e.data);
console.log(`\n[Citation] ${citation.id}: ${citation.permalink}`);
});
eventSource.addEventListener('done', (e) => {
const metrics = JSON.parse(e.data);
console.log('\n[Done]', metrics);
eventSource.close();
});
eventSource.addEventListener('error', (e) => {
console.error('[Error]', e);
eventSource.close();
});
return eventSource;
}Python
import json
import requests
import sseclient
def ask_question(question: str):
response = requests.post(
'http://localhost:8000/api/ask',
json={'question': question},
stream=True
)
client = sseclient.SSEClient(response)
full_response = ""
citations = []
for event in client.events():
data = json.loads(event.data)
if data['type'] == 'token':
full_response += data['content']
print(data['content'], end='', flush=True)
elif data['type'] == 'citation':
citations.append(data['citation'])
print(f"\n[{data['citation']['id']}] {data['citation']['permalink']}")
elif data['type'] == 'done':
print(f"\n\nTokens: {data['tokens_used']}, Citations: {data['citations_count']}")
break
elif data['type'] == 'error':
print(f"\nError: {data['error']}")
break
return full_response, citationscurl
curl -N http://localhost:8000/api/ask \
-H "Content-Type: application/json" \
-d '{"question": "What is the architecture?"}' \
| while read -r line; do
if [[ $line == data:* ]]; then
echo "${line#data: }" | jq -r '.select(.type == "token").content' | tr -d '"'
fi
doneError Handling
Connection Errors
- Timeout: Increase client timeout or reduce query complexity
- 401 Unauthorized: Check authentication credentials
- 429 Rate Limit: Implement exponential backoff
Stream Errors
{
"type": "error",
"error": "Rate limit exceeded",
"code": "RATE_LIMIT_EXCEEDED",
"retry_after": 60
}Retry Logic
async function askWithRetry(question, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await askQuestion(question);
} catch (error) {
if (error.code === 'RATE_LIMIT_EXCEEDED' && i < maxRetries - 1) {
await sleep(error.retry_after * 1000);
continue;
}
throw error;
}
}
}Best Practices
- Buffer Tokens: Collect tokens before rendering to avoid partial words
- Handle Citations: Display citations as footnotes or inline references
- Track Sessions: Use session IDs for follow-up questions
- Error Recovery: Implement reconnection logic for network failures
- Rate Limiting: Respect rate limits and implement backoff
- Close Connections: Always close connections when done
Session Management
The session_id from the done event enables conversational follow-ups:
// First question
const { session_id } = await askQuestion("What is JWT?");
// Follow-up (uses context from first question)
await askQuestion("How do we refresh tokens?", session_id);Sessions maintain conversation context for 30 minutes of inactivity.