Managing chat history in LLM applications presents a crucial balance between maintaining context and managing token limits. Every token costs money and consumes context window space, yet losing important context can severely impact the quality of responses.
The simplest approach is maintaining a fixed window of recent messages. While straightforward, this method risks losing important context from earlier conversations.
const manageHistory = (messages: Message[], windowSize: number): Message[] => {
return messages.slice(-windowSize);
};
This more sophisticated approach, which we'll focus on in this course, involves:
The process typically works like this:
interface Message {
role: 'user' | 'assistant' | 'system';
content: string;
timestamp: number;
}
const manageHistoryWithSummary = async (
messages: Message[],
activeWindowSize: number,
summaryThreshold: number
) => {
if (messages.length < summaryThreshold) {
return messages;
}
// Keep recent messages
const recentMessages = messages.slice(-activeWindowSize);
// Summarize older messages
const messagesToSummarize = messages.slice(0, -activeWindowSize);
const summary = await summarizeMessages(messagesToSummarize);
// Update system prompt with summary
return [
{ role: 'system', content: `Previous conversation context: ${summary}` },
...recentMessages
];
};
A more advanced version of the summarization approach that maintains multiple levels of context: