Conversational Memory

Conversational Memory#

As we mentioned in previous recipes, large language models have no internal state, i.e., they do not retain any conversational context from previous messages. A multi-turn conversation works by passing an increasingly longer prompt to the model that includes all previous messages in addition to the most recent one. There are several ways to manage the conversational context, or conversational memory, which have their individual strenghts and weaknesses. In this recipe, we will explore some of the most common ones.

from dotenv import find_dotenv, load_dotenv

load_dotenv(find_dotenv())

True

from langchain_dartmouth.llms import ChatDartmouth

llm = ChatDartmouth(model_name="llama-3-2-11b-vision-instruct", temperature=0, seed=42)

Non-persistent conversational memory#

We can think of the conversational memory as the history of all messages that have been passed to and received from the model so far. In Prompt Basics, we saw that we can pass a list of messages to a chat model. We can use this mechanism to create a simple conversational memory system by appending every message (outgoing and incoming) to a list:

from langchain_core.messages import HumanMessage

first_message = HumanMessage("Ask me a riddle!")
conversation = [first_message]

first_response = llm.invoke(conversation)
conversation.append(first_response)

for message in conversation:
    message.pretty_print()

================================ Human Message =================================

Ask me a riddle!
================================== Ai Message ==================================

I have a riddle for you:

I am always coming but never arrive,
I have a head but never hair,
I have a bed but never sleep,
I have a mouth but never speak.

What am I?

second_message = HumanMessage("Is it a unicorn?")
conversation.append(second_message)

second_response = llm.invoke(conversation)
conversation.append(second_response)

for message in conversation:
    message.pretty_print()

================================ Human Message =================================

Ask me a riddle!
================================== Ai Message ==================================

I have a riddle for you:

I am always coming but never arrive,
I have a head but never hair,
I have a bed but never sleep,
I have a mouth but never speak.

What am I?
================================ Human Message =================================

Is it a unicorn?
================================== Ai Message ==================================

That's a creative answer, but unfortunately, it's not a unicorn. Unicorns don't quite fit the description in the riddle.

Here's a hint: think about something that you might find in nature, and that has a distinct "head" and "bed".

Want to take another guess?

While this technique works for relatively simple scenarios, it’s not very elegant and requires quite a bit of code to maintain the history. It also can be potentially problematic when the LLM is part of a chain and we don’t want to pass the conversation history as an input to the chain.

Instead, the LLM component should keep track of the message history internally!

Fortunately, LangChain offers a way to make that happen. We need two things for this:

a component that keeps track of the message history (replacing the simple list above)
a way for the LLM to interact with this list whenever a new (input or output) message arrives (replacing the list management code we wrote above)

To keep track of the message history, we can use a class called ChatMessageHistory:

from langchain_community.chat_message_histories import ChatMessageHistory

history = ChatMessageHistory()

This component works very similarly to the simple list we used above, but is more explicitly designed to be used with messages. For example, here is how we create the history from above:

history.add_message(first_message)
history.add_message(first_response)
history.add_message(second_message)
history.add_message(second_response)

history.messages

[HumanMessage(content='Ask me a riddle!', additional_kwargs={}, response_metadata={}),
 AIMessage(content='I have a riddle for you:\n\nI am always coming but never arrive,\nI have a head but never hair,\nI have a bed but never sleep,\nI have a mouth but never speak.\n\nWhat am I?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 45, 'prompt_tokens': 42, 'total_tokens': 87, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'llama-3-1-8b-instruct', 'system_fingerprint': '3.2.0-sha-411a282', 'id': '', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--afdd672b-7c1e-4bf9-8884-66b22342cb63-0', usage_metadata={'input_tokens': 42, 'output_tokens': 45, 'total_tokens': 87, 'input_token_details': {}, 'output_token_details': {}}),
 HumanMessage(content='Is it a unicorn?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='That\'s a creative answer, but unfortunately, it\'s not a unicorn. Unicorns don\'t quite fit the description in the riddle.\n\nHere\'s a hint: think about something that you might find in nature, and that has a distinct "head" and "bed".\n\nWant to take another guess?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 63, 'prompt_tokens': 101, 'total_tokens': 164, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'llama-3-1-8b-instruct', 'system_fingerprint': '3.2.0-sha-411a282', 'id': '', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--5466058a-ea7e-4921-a815-502c7a161c57-0', usage_metadata={'input_tokens': 101, 'output_tokens': 63, 'total_tokens': 164, 'input_token_details': {}, 'output_token_details': {}})]

To make an LLM use a ChatMessageHistory object, we need to “attach” it to the ChatDartmouth component by wrapping them with a class called RunnableWithMessageHistory.

This class assumes that we want to be able to manage multiple conversation histories, as we would in a chat application. It therefore expects a function that returns a chat message history object given a session id. In this example, we only keep track of a single conversation, so we can just return the same history every time. So we just need to write a very simple dummy function:

history = ChatMessageHistory()


def get_history(session_id):
    return history

Note

We have to make sure to instantiate the history outside the function. Otherwise, the message history would not persist between calls to get_history!

Now we have everything we need to tie it all together:

from langchain_core.runnables.history import RunnableWithMessageHistory


llm_with_memory = RunnableWithMessageHistory(
    runnable=llm,
    get_session_history=get_history,
)

Hint

LangChain calls all components that implement the standard interface of the invoke and stream methods (and some others) a runnable.

When we invoke this runnable, we have to specify the session id that will be passed to get_history (even though we don’t use it here):

llm_with_memory.invoke(
    {
        "input": "Tell me a riddle!",
    },
    config={"configurable": {"session_id": "whatever"}},
).pretty_print()

================================== Ai Message ==================================

I have a riddle for you:

I am always coming but never arrive,
I have a head but never hair,
I have a bed but never sleep,
I have a mouth but never speak.

What am I?

(Let me know if you need a hint!)

llm_with_memory.invoke(
    {"input": "Give me a hint"},
    config={"configurable": {"session_id": "whatever"}},
).pretty_print()

================================== Ai Message ==================================

Here's a hint:

Think about something that moves or flows, and is often associated with nature. It's a bit of a paradox, because it's always in motion, but it never actually reaches its destination.

Also, consider the different parts of the riddle (head, bed, mouth) - they're not literal, but rather metaphorical or descriptive.

Does that help?

llm_with_memory.invoke(
    {"input": "Is it a river?"},
    config={"configurable": {"session_id": "whatever"}},
).pretty_print()

================================== Ai Message ==================================

You're flowing in the right direction (pun intended)!

Yes, the answer is indeed a river. Well done!

Here's how the riddle relates to a river:

* "I am always coming but never arrive": A river is constantly flowing, but it never actually arrives at a destination, because it's always in motion.
* "I have a head but never hair": A river has a "head" or source, but it doesn't have hair.
* "I have a bed but never sleep": A river has a "bed" or channel, but it's always flowing and never rests or sleeps.
* "I have a mouth but never speak": A river has a "mouth" or outlet, but it doesn't have the ability to speak.

You're really good at this! Do you want to try another riddle?

We can check the message history object to see that indeed keeps track of all the messages:

for message in history.messages:
    message.pretty_print()

================================ Human Message =================================

Tell me a riddle!
================================== Ai Message ==================================

I have a riddle for you:

I am always coming but never arrive,
I have a head but never hair,
I have a bed but never sleep,
I have a mouth but never speak.

What am I?

(Let me know if you need a hint!)
================================ Human Message =================================

Give me a hint
================================== Ai Message ==================================

Here's a hint:

Think about something that moves or flows, and is often associated with nature. It's a bit of a paradox, because it's always in motion, but it never actually reaches its destination.

Also, consider the different parts of the riddle (head, bed, mouth) - they're not literal, but rather metaphorical or descriptive.

Does that help?
================================ Human Message =================================

Is it a river?
================================== Ai Message ==================================

You're flowing in the right direction (pun intended)!

Yes, the answer is indeed a river. Well done!

Here's how the riddle relates to a river:

* "I am always coming but never arrive": A river is constantly flowing, but it never actually arrives at a destination, because it's always in motion.
* "I have a head but never hair": A river has a "head" or source, but it doesn't have hair.
* "I have a bed but never sleep": A river has a "bed" or channel, but it's always flowing and never rests or sleeps.
* "I have a mouth but never speak": A river has a "mouth" or outlet, but it doesn't have the ability to speak.

You're really good at this! Do you want to try another riddle?

Looks great, doesn’t it?

One issue remains, however: Depending on our use case, we might want to persist the message history between runs of the program. Or maybe we want to be able to do something more meaningful with the session id in get_history, e.g., manage multiple conversations. In the next section, we will learn about ways to achieve both of those things!

Persistent conversational memory#

While we could write the message history to disk at the end of every run of our program and read it back in at the start of each run, that would be a bit cumbersome and would require additional boilerplate code. We also might want to consider different options to store the history, like a SQL database.

LangChain offers a variety of implementations for the message history, built on different services. For example, we can store the history to a SQLite database:

from langchain_community.chat_message_histories import SQLChatMessageHistory

DB_NAME = "chat_history.db"

def get_history(session_id):
    return SQLChatMessageHistory(session_id, connection=f"sqlite:///{DB_NAME}")

Now every time we call get_history, the chat history will be retrieved from the specified SQLite database, using the session ID as a filter.

We can now manage multiple conversations by specifying a separate ID for each conversation thread:

llm_with_memory = RunnableWithMessageHistory(
    runnable=llm,
    get_session_history=get_history,
)

llm_with_memory.invoke(
    {"input": "Hi, I am Simon!"},
    config={"configurable": {"session_id": "simons_convo"}},
).pretty_print()

================================== Ai Message ==================================

Nice to meet you, Simon! Is there something I can help you with or would you like to chat?

llm_with_memory.invoke(
    {"input": "Hi, I am Alex!"},
    config={"configurable": {"session_id": "alex_convo"}},
).pretty_print()

================================== Ai Message ==================================

Nice to meet you, Alex! Is there something I can help you with or would you like to chat?

We can inspect the database like any other SQLite database, e.g. with Python’s built-in sqlite3 module:

import sqlite3

con = sqlite3.connect(DB_NAME)
cur = con.cursor()

# By default, the table name is 'message_store'
cur.execute("SELECT * FROM message_store;").fetchall()

[(1,
  'simons_convo',
  '{"type": "human", "data": {"content": "Hi, I am Simon!", "additional_kwargs": {}, "response_metadata": {}, "type": "human", "name": null, "id": null, "example": false}}'),
 (2,
  'simons_convo',
  '{"type": "ai", "data": {"content": "Nice to meet you, Simon! Is there something I can help you with or would you like to chat?", "additional_kwargs": {"refusal": null}, "response_metadata": {"token_usage": {"completion_tokens": 23, "prompt_tokens": 42, "total_tokens": 65, "completion_tokens_details": null, "prompt_tokens_details": null}, "model_name": "llama-3-1-8b-instruct", "system_fingerprint": "3.2.0-sha-411a282", "id": "", "service_tier": null, "finish_reason": "stop", "logprobs": null}, "type": "ai", "name": null, "id": "run--fad4e6d0-e683-4146-a52b-bcb4beb3750c-0", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": {"input_tokens": 42, "output_tokens": 23, "total_tokens": 65, "input_token_details": {}, "output_token_details": {}}}}'),
 (3,
  'alex_convo',
  '{"type": "human", "data": {"content": "Hi, I am Alex!", "additional_kwargs": {}, "response_metadata": {}, "type": "human", "name": null, "id": null, "example": false}}'),
 (4,
  'alex_convo',
  '{"type": "ai", "data": {"content": "Nice to meet you, Alex! Is there something I can help you with or would you like to chat?", "additional_kwargs": {"refusal": null}, "response_metadata": {"token_usage": {"completion_tokens": 23, "prompt_tokens": 42, "total_tokens": 65, "completion_tokens_details": null, "prompt_tokens_details": null}, "model_name": "llama-3-1-8b-instruct", "system_fingerprint": "3.2.0-sha-411a282", "id": "", "service_tier": null, "finish_reason": "stop", "logprobs": null}, "type": "ai", "name": null, "id": "run--b070e983-4f6a-45ba-8316-f96498a4833a-0", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": {"input_tokens": 42, "output_tokens": 23, "total_tokens": 65, "input_token_details": {}, "output_token_details": {}}}}')]

As we can see, the conversations are organized by the session ID, so we can continue to have separate conversations by passing the respective session ID:

llm_with_memory.invoke(
    {"input": "What's my name again?"},
    config={"configurable": {"session_id": "simons_convo"}},
).pretty_print()

================================== Ai Message ==================================

Your name is Simon.

llm_with_memory.invoke(
    {"input": "What's my name again?"},
    config={"configurable": {"session_id": "alex_convo"}},
).pretty_print()

================================== Ai Message ==================================

Your name is Alex.

Since the database is stored on disk by default, the history is automatically persisted across multiple runs of the program. If you want to use one of the other implementations of the chat message history, you only need to change the get_history function!

Summary#

LLMs are stateless and thus require the entire conversation to generate the next turn. We can keep track of the conversation manually by maintaining a list of all outgoing and incoming messages. If we want a more elegant solution that can optionally persist the message history using a variety of backends (e.g., a SQL database), we can use one of LangChain’s implementations of the chat message history.