流式响应

使用 SSE(Server-Sent Events)实时输出 AI 生成内容

概述

流式响应允许 AI 模型在生成内容时实时推送数据,无需等待完整响应。这可以显著提升用户体验,特别是在生成长文本时。

启用流式响应

在请求中设置 stream: true 即可启用流式响应:

curl
curl https://api.lingyuncx.com/v1/chat/completions \
  -H "Authorization: Bearer sk-xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "写一首诗"}],
    "stream": true
  }'

SSE 数据格式

流式响应使用 SSE 格式,每条消息以 data: 开头:

text
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"春"},"index":0}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"眠"},"index":0}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"不"},"index":0}]}

data: [DONE]

客户端示例

Python 示例

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lingyuncx.com/v1",
    api_key="sk-xxxxxxxx"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "写一首诗"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Node.js 示例

JavaScript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.lingyuncx.com/v1',
  apiKey: 'sk-xxxxxxxx'
});

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: '写一首诗' }],
  stream: true
});

for await (const chunk of stream) {
  if (chunk.choices[0].delta.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

浏览器 EventSource 示例

JavaScript
const response = await fetch('https://api.lingyuncx.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk-xxxxxxxx',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: '写一首诗' }],
    stream: true
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value);
  const lines = text.split('\n');

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      if (data.choices[0].delta.content) {
        console.log(data.choices[0].delta.content);
      }
    }
  }
}

注意事项

  • Token 计数:流式响应的 Token 计数与非流式相同,不会产生额外费用
  • 错误处理:流式响应中错误会在单独的 data: 消息中返回
  • 结束标志:流结束时返回 data: [DONE]
  • 超时设置:建议客户端设置合理的超时时间(推荐 60 秒)

💡 提示

流式响应特别适合聊天机器人、内容生成等需要实时反馈的场景。

📚 相关阅读

Chat 接口文档 | 错误处理