
Open any "How to build AI chat in React" tutorial. What do you see? fetch , ReadableStream , setState in a loop — and magic, tokens flying onto the screen. Beautiful. Works. In dev mode. With one user. In one tab. As long as nobody switches anything. But what happens when a user sends a second message before the first one finishes? When they switch chats mid-stream? When they leave for another app on their phone and come back a minute later? The answer: everything breaks. Quietly, with no errors, no stack traces. Just — wrong data on the screen. I work at a healthcare company where we stream AI assistant responses into a React app. Every bug I'll cover in this article is a real production problem that I personally debugged and fixed. We'll walk the path from the most naive implementation to an architecture that holds up in production. And along the way we'll understand one important thing: a stream and React are two different lifecycles, and if you don't synchronize them, bugs are inevitable. Step 1. Naive fetch streaming Let's start with what every developer does first. Request a response from the LLM and read tokens as they arrive. First, let's agree on the message type — it's the same across all the examples below. Important: id exists on every message, including the user's. Without a stable id , both React list keys and token matching ( m.id === messageId ) break: type Message = { id: string; role: "user" | "assistant"; content: string; status?: "streaming" | "done" | "error"; }; In the examples below I'm omitting error handling and request headers where they aren't relevant to the bug under discussion — otherwise the code would drown in try/catch . I'll show them explicitly in one or two key spots; in production you need them everywhere. function useChatStream() { const [messages, setMessages] = useState<Message[]>([]); async function sendMessage(content: string) { // Add the user's message setMessages((prev) => [ ...prev, { id: crypto.randomUUID(), role: "user", content }, ]); // Create an empty assistant message const assistantId = crypto.randomUUID(); setMessages((prev) => [ ...prev, { id: assistantId, role: "assistant", content: "" }, ]); // Stream the response const response = await fetch("/api/chat", { method: "POST", body: JSON.stringify({ message: content }), }); const reader = response.body!.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value, { stream: true }); // Append tokens to the assistant message setMessages((prev) => prev.map((m) => m.id === assistantId ? { ...m, content: m.content + chunk } : m, ), ); } } return { messages, sendMessage }; } Works. Tokens appear on screen one after another, just like ChatGPT. Beautiful. Where it breaks Scenario 1: Two messages in a row. The user sent a message, didn't wait for the response, and sent another one. The first stream is still running, the second has started. Both write to messages via setMessages . Depending on timing, the second stream overwrites the first's tokens, or the first appends tokens after the second has already created its own message. Result: fragments of someone else's response inside the messages. Scenario 2: Switching chats. The user is streaming a response, switches to another conversation, comes back. The callback from reader.read() is still running, but assistantId in the closure is bound to the old chat. Tokens fly off into nowhere or, worse, into the new chat. Scenario 3: Broken characters and truncated events. This is a quiet bug that hits non-Latin text especially hard. A chunk from reader.read() is an arbitrary slice of bytes, not a clean character or event boundary. A Cyrillic character takes two bytes in UTF-8, and a chunk boundary can cut it in half: one byte arrives in this chunk, the second in the next. TextDecoder with { stream: true } handles this on its own — it buffers the incomplete sequence until the next call. But as soon as you move from raw text to an SSE protocol with events (and you will, see Step 3), the second half of the problem surfaces: a single SSE event ( data: ...\n\n ) can also arrive split between chunks. If you parse a chunk immediately, without waiting for \n\n , you get invalid JSON and a lost token. For a medical assistant, this means the "recommended dose" can silently turn into the "recommended do" — and nobody notices. Why it's hard to catch: the happy path works → tests are green → the bug only shows up under a specific sequence of actions. The fix: Stream Identity Guard The key idea: every stream gets a unique ID, and every callback checks — "am I still current?" function useChatStream() { const [messages, setMessages] = useState<Message[]>([]); const streamRef = useRef<string | null>(null); const abortRef = useRef<AbortController | null>(null); async function sendMessage(content: string) { // Generate a unique ID for this stream const streamId = crypto.randomUUID(); streamRef.current = streamId; // Cancel the previous stream: the guard protects state, abort protects transport abortRef.current?.abort(); const controller = new AbortController(); abortRef.current = controller; setMessages((prev) => [ ...prev, { id: crypto.randomUUID(), role: "user", content }, ]); const assistantId = crypto.randomUUID(); setMessages((prev) => [ ...prev, { id: assistantId, role: "assistant", content: "", status: "streaming" }, ]); try { const response = await fetch("/api/chat", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ message: content }), signal: controller.signal, }); if (!response.ok || !response.body) { throw new Error(`Chat request failed: ${response.status}`); } const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; // Guard: am I still the current stream? if (streamRef.current !== streamId) return; const chunk = decoder.decode(value, { stream: true }); setMessages((prev) => prev.map((m) => m.id === assistantId ? { ...m, content: m.content + chunk } : m, ), ); } setMessages((prev) => prev.map((m) => (m.id === assistantId ? { ...m, status: "done" } : m)), ); } catch (err) { // AbortError means we cancelled the stream ourselves, not a real error if ((err as Error).name === "AbortError") return; setMessages((prev) => prev.map((m) => (m.id === assistantId ? { ...m, status: "error" } : m)), ); } } return { messages, sendMessage }; } Now, when the user sends a second message, the new stream gets a new streamId , and all of the old stream's callbacks turn into no-ops. Tokens don't fly off to the wrong place. Note the separation of responsibilities: AbortController stops the transport (fetch and body reading stop holding the connection and burning backend resources), while the guard via streamRef protects the state (even if a stale callback does run, it won't write to messages ). A guard alone isn't enough — without abort, the old reader would keep pulling the body for nothing. About the broken characters from Scenario 3: decoder.decode(value, { stream: true }) already holds the UTF-8 boundary for you — the key is to reuse one TextDecoder instance for the whole stream, rather than creating a new one per chunk (otherwise the buffer of the incomplete sequence is lost). The SSE event boundary, however, you'll have to stitch together by hand: copy incoming chunks into a string buffer and pull out only the complete events delimited by \n\n , leaving the "tail" in the buffer until the next chunk. That buffer is essentially what we arrive at in Step 3. But the problem is deeper than a stale closure. The cause is that the stream outlives a single render. React is a pull-based system: a component renders a snapshot of state. A stream is push-based: tokens arrive whenever they want. These two processes don't know about each other. Step 2. WebSocket — a persistent connection Fetch streaming works for simple cases. But it has limits: every request is a new connection, no two-way communication, no reactivity. A real chat needs a WebSocket: function useChatWebSocket(chatId: string) { const [messages, setMessages] = useState<Message[]>([]); const wsRef = useRef<WebSocket | null>(null); useEffect(() => { const ws = new WebSocket(`wss://api.example.com/chat/${chatId}`); wsRef.current = ws; ws.onmessage = (event) => { const data = JSON.parse(event.data); if (data.type === "token") { // A token arrived — append it to the message setMessages((prev) => prev.map((m) => m.id === data.messageId ? { ...m, content: m.content + data.token } : m, ), ); } if (data.type === "message_start") { setMessages((prev) => [ ...prev, { id: data.messageId, role: "assistant", content: "" }, ]); } }; return () => { ws.close(); }; }, [chatId]); function sendMessage(content: string) { const ws = wsRef.current; // optional chaining saves you from null, but NOT from CONNECTING: // send() in state 0 throws InvalidStateError if (ws?.readyState !== WebSocket.OPEN) return; setMessages((prev) => [ ...prev, { id: crypto.randomUUID(), role: "user", content }, ]); ws.send(JSON.stringify({ type: "message", content })); } return { messages, sendMessage }; } The upsides are obvious: one connection, two-way communication, real time. The backend can send tokens, generation status, notifications — all over a single channel. Where it breaks Reconnect. The connection drops — Wi-Fi switched, the server restarted, a proxy closed the idle connection. You need to reconnect. But what if a stream was running at the moment of the drop? How do you know where it stopped? Dropped tokens. Within a single connection, order is guaranteed — WebSocket runs over TCP. But on a drop and reconnect you don't know how many tokens you lost during the downtime: between "Hel" and "o, how are you" a chunk like "l" could have been lost, and the user sees a fragment. To catch this, every token needs a sequence number ( seq ) — we'll come back to this idea in resume. But the main thing — the lifecycle still isn't synchronized. A WebSocket outlives the component. The user switches chats → the component unmounts → the useEffect cleanup closes the WS. But what if the user comes back? A new connection — but what about the cache? The fix: Explicit Cleanup + Stale marking When the user leaves a chat, you need to do more than just close the connection — you need to make a deliberate decision: what do you do with the backend? function useChat(chatId: string) { const [messages, setMessages] = useState<Message[]>([]); const wsRef = useRef<WebSocket | null>(null); const staleChatsRef = useRef(new Set<string>()); useEffect(() => { // Was the chat stale? We need a forced fetch, not the cache if (staleChatsRef.current.has(chatId)) { staleChatsRef.current.delete(chatId); fetchMessages(chatId).then(setMessages); } else { // Normal load from cache loadMessages(chatId).then(setMessages); } const ws = new WebSocket(`wss://api.example.com/chat/${chatId}`); wsRef.current = ws; ws.onmessage = (event) => { const data = JSON.parse(event.data); // ... token handling }; return () => { // Cleanup: close the connection, mark the chat as stale ws.close(); staleChatsRef.current.add(chatId); }; }, [chatId]); // ... } The key insight: cleanup during streaming is not just "cancel the request." It's a decision: what the backend does (keeps generating), what the client does (closes the connection), and how they synchronize on the next contact (forced fetch instead of cache). WebSocket is a powerful tool, but for AI chats it's often overkill. We don't need two-way communication. We only need to stream tokens from server to client. And that's where SSE comes in. Step 3. SSE — streaming without the overhead Server-Sent Events is HTTP-based streaming. Simpler than WebSocket: one-directional (server → client), auto-reconnect out of the box, works through ordinary proxies. function useChatSSE(chatId: string) { const [messages, setMessages] = useState<Message[]>([]); const esRef = useRef<EventSource | null>(null); useEffect(() => { loadMessages(chatId).then(setMessages); const es = new EventSource(`https://api.example.com/chat/${chatId}/stream`); esRef.current = es; es.addEventListener("message_start", (event) => { const { messageId } = JSON.parse(event.data); setMessages((prev) => [ ...prev, { id: messageId, role: "assistant", content: "" }, ]); }); es.addEventListener("token", (event) => { const { messageId, token } = JSON.parse(event.data); setMessages((prev) => prev.map((m) => m.id === messageId ? { ...m, content: m.content + token } : m, ), ); }); es.addEventListener("done", (event) => { // Do NOT close the stream: it's persistent for the whole chat. Close it here // and the second message in this same chat will silently fail to stream // (chatId didn't change → useEffect won't re-run). // The close belongs only in cleanup when leaving the chat. const { messageId } = JSON.parse(event.data); setMessages((prev) => prev.map((m) => (m.id === messageId ? { ...m, status: "done" } : m)), ); }); return () => { es.close(); }; }, [chatId]); async function sendMessage(content: string) { setMessages((prev) => [ ...prev, { id: crypto.randomUUID(), role: "user", content }, ]); // Trigger generation; tokens will arrive on the persistent EventSource above await fetch(`/api/chat/${chatId}/message`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ content }), }); } return { messages, sendMessage }; } Note the architecture: the subscription ( GET .../stream ) and sending a message ( POST .../message ) are two different endpoints . This is how it has to be, because the native EventSource can only do GET and accepts no request body or custom headers (including Authorization — auth goes through a cookie or query parameter). So the model is this: the client holds an open SSE channel for the chat, the POST merely triggers generation, and the backend publishes tokens into the already-open stream for that chatId (essentially pub/sub). If on the backend the POST and the stream aren't tied together by chatId , the tokens simply won't arrive — and that's the very "quiet" bug. The native EventSource parses the data: ...\n\n protocol itself and hands you a complete event in event.data — you don't have to think about the chunk stitching I mentioned in Step 1. But as soon as you replace it with a polyfill on top of fetch (and on React Native you'll have to, see Step 4), event parsing lands on you again — and the buffer for incomplete \n\n comes back into your code. SSE reconnects automatically on a drop. You don't have to write reconnect logic like you do with WebSocket. HTTP-based — works through any proxy and CDN. Simple protocol — data: ...\n\n and that's it. Where it breaks Cache libraries. If you use React Query, RTK Query, or SWR, it's important to understand how they behave without an active subscriber. The cache itself doesn't go anywhere and can be updated imperatively ( setQueryData , updateQueryData , mutate ). But the automatic background refetch stops: when the user switches chats — the component unmounts — the query becomes inactive — no auto-update on mount happens. If you were relying precisely on "it'll remount and fetch fresh data on its own," you fall into a trap: meanwhile the backend has finished writing the response to the DB, and the user, coming back, sees the old cache with no response. The solution is to combine the stale marking from the previous step with SSE: function useChat(chatId: string) { const staleChatsRef = useRef(new Set<string>()); useEffect(() => { if (staleChatsRef.current.has(chatId)) { staleChatsRef.current.delete(chatId); // Forced fetch — bypass the cache fetchMessagesFresh(chatId).then(setMessages); } else { loadMessages(chatId).then(setMessages); } const es = new EventSource(`/api/chat/${chatId}/stream`); // ... event subscription return () => { es.close(); staleChatsRef.current.add(chatId); }; }, [chatId]); } Everything we've covered — stream identity, cleanup, stale marking — these are universal principles. They work with fetch, WebSocket, SSE. But so far we've been in the browser. Now imagine: all of this needs to run on a mobile device. Step 4. And now — mobile Everything above was still the easy part. In the browser, a tab lives until the user closes it. On mobile, the OS decides when to "kill" the process. And it makes use of that. SSE on React Native React Native has no native EventSource . There are polyfills (for example, react-native-sse or event-source-polyfill ), but they run on top of fetch and don't support auto-reconnect as reliably as the browser implementation. And, as I noted above, parsing SSE events — including buffering incomplete data: ...\n\n — becomes your responsibility again. The alternative is a WebView: you wrap the SSE client in a WebView and communicate with it via postMessage . This works, but it adds a layer of abstraction and its own bugs. But the main problem isn't the polyfill. Background killing The user is streaming an AI response, swipes over to Telegram, reads a message, comes back 30 seconds later. In the browser, the tab just went to sleep — the SSE connection is alive, tokens accumulate in the buffer. On mobile: The OS kills the SSE connection when going to background (iOS — aggressively, Android — less so, but still) The backend keeps generating (it doesn't care whether there's a client) On return — no EventSource, no subscription, no tokens Result: the response is cut off mid-word. The user sees "To treat this condition it is recommended to…" and nothing more. Resume, not restart The naive solution is to just re-request the response on return from background. But a restart means a new generation . It's a new response from the LLM, which may differ from the previous one. Plus it's expensive: every LLM request costs money. The right approach is resume : pick the stream up from where it stopped. function useMobileChat(chatId: string) { const [messages, setMessages] = useState<Message[]>([]); const esRef = useRef<EventSource | null>(null); const lastSeqIdRef = useRef<number>(0); const wasInterruptedRef = useRef(false); useEffect(() => { const subscription = AppState.addEventListener( "change", handleAppStateChange, ); // chatId in the dependencies: otherwise handleAppStateChange closes over // the old chat and after a switch will resume the wrong stream return () => { subscription.remove(); // Close the stream on unmount/chat switch — otherwise it leaks, // and the token callback keeps calling setMessages on the unmounted // component (Rule 2: Explicit Cleanup) esRef.current?.close(); }; }, [chatId]); async function handleAppStateChange(state: string) { if (state !== "active") { // Going to background — close the connection esRef.current?.close(); wasInterruptedRef.current = true; return; } if (!wasInterruptedRef.current) return; wasInterruptedRef.current = false; // Returning from background — check the generation status const { status, run_id } = await getChatStatus(chatId); if (status === "processing" && run_id) { // The backend is still working — re-attach to the live run resumeStream(chatId, run_id, lastSeqIdRef.current); } else { // The backend has finished — grab the ready response const fresh = await fetchMessagesFresh(chatId); setMessages(fresh); } } function resumeStream(chatId: string, runId: string, lastSeqId: number) { // Pass lastSeqId — the backend will send only tokens after this ID const es = new EventSource( `/api/chat/${chatId}/stream?run_id=${runId}&after_seq=${lastSeqId}`, ); esRef.current = es; es.addEventListener("token", (event) => { const { messageId, token, seqId } = JSON.parse(event.data); lastSeqIdRef.current = seqId; // Match by messageId, not by role: otherwise the token gets appended // to ALL assistant responses in history, not just the current one setMessages((prev) => prev.map((m) => m.id === messageId ? { ...m, content: m.content + token } : m, ), ); }); es.addEventListener("done", () => { es.close(); }); } } lastSeqId is a cursor into the token stream. The backend numbers every token, and on resume the client says: "I received tokens up to #42, let's continue from #43." Not a restart, no data loss, no extra generation cost. It's not just mobile For web developers this might feel like a distant problem. But an analogous situation arises in the browser: A tab goes to sleep when inactive (Chrome throttles background tabs) A WebSocket drops when switching Wi-Fi → mobile internet The network disappears in a tunnel The approach is the same: resume, not restart. A cursor into the stream + a generation-status check when the connection is restored. Conclusion: the mental model We've gone from a naive fetch + setState to resume streaming on mobile. What ties all these problems together? The stream lifecycle and the React lifecycle are two independent processes. React is a pull-based system. A component renders a snapshot of state. When state updates, React pulls a new render. LLM streaming is push-based. Tokens arrive whenever they want, without a request from React. The stream outlives a single render. During the stream the component can remount, the user can switch chats, the app can go to background. React doesn't know about the stream. The stream doesn't know about React. If you don't synchronize them explicitly, there will be bugs. Not "might be" — there will be. Three rules Stream Identity — every stream gets a unique ID, every callback checks "am I still current?". This protects against stale closures and races between multiple streams. Explicit Cleanup — when the component leaves: close the connection, mark the chat as stale, and on return do a forced sync instead of cache. Cleanup isn't just ws.close() , it's a deliberate decision about what happens on the backend and on the client. Resume, not restart — when a stream is interrupted, pick it up from where it stopped via a cursor, rather than starting a new generation. This saves money, preserves context, and works on any platform. The main rule Push-based data should not live in React state directly. React subscribes to the result, the stream owns the transport. This applies not only to LLM streaming. WebSocket, Web Workers, collaborative editing, real-time dashboards — anywhere data is "pushed" from outside while React tries to "pull" it. Separate the transport owner from the data consumer, and half your bugs disappear.
View original source — Hacker Noon ↗

