Embeddings

class langchain_dartmouth.embeddings.DartmouthEmbeddings

Embedding models deployed on Dartmouth’s cluster.

Parameters:
  • model_name (str, optional) – The name of the embedding model to use, defaults to "bge-large-en-v1-5".

  • model_kwargs (dict, optional) – Keyword arguments to pass to the model.

  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • authenticator (Callable, optional) – A Callable returning a JSON Web Token (JWT) for authentication. Only needed for special use cases.

  • jwt_url (str, optional) – URL of the Dartmouth API endpoint returning a JSON Web Token (JWT).

  • embeddings_server_url (str, optional) – URL pointing to an embeddings endpoint, defaults to "https://ai-api.dartmouth.edu/tei/".

Example

With an environment variable named DARTMOUTH_API_KEY pointing to your key obtained from https://developer.dartmouth.edu, using a Dartmouth-hosted embedding model only takes a few lines of code:

from langchain_dartmouth.embeddings import DartmouthEmbeddings

embeddings = DartmouthEmbeddings()

response = embeddings.embed_query("Hello? Is there anybody in there?")

print(response)
static list(dartmouth_api_key=None, url='https://api.dartmouth.edu/api/ai/models/')

List the models available through DartmouthEmbeddings.

Parameters:
  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • url (str, optional) – URL of the listing server

Returns:

A list of descriptions of the available models

Return type:

list[dict]

async aembed_documents(texts)

Async Call to the embedding endpoint to retrieve the embeddings of multiple texts.

Parameters:
  • text (str) – The list of texts to embed.

  • texts (List[str])

Returns:

Embeddings for the texts.

Return type:

List[List[float]]

async aembed_query(text)

Async Call to the embedding endpoint to retrieve the embedding of the query text.

Parameters:

text (str) – The text to embed.

Returns:

Embeddings for the text.

Return type:

List[float]

embed_documents(texts)

Call out to the embedding endpoint to retrieve the embeddings of multiple texts.

Parameters:
  • text (str) – The list of texts to embed.

  • texts (List[str])

Returns:

Embeddings for the texts.

Return type:

List[List[float]]

embed_query(text)

Call out to the embedding endpoint to retrieve the embedding of the query text.

Parameters:

text (str) – The text to embed.

Returns:

Embeddings for the text.

Return type:

List[float]

Large Language Models

class langchain_dartmouth.llms.DartmouthLLM

Dartmouth-deployed Large Language Models. Use this class for non-chat models (e.g., CodeLlama 13B).

This class does not format the prompt to adhere to any required templates. The string you pass to it is exactly the string received by the LLM. If the desired model requires a chat template (e.g., Llama 3.1 Instruct), you may want to use ChatDartmouth instead.

Parameters:
  • model_name (str, optional) – Name of the model to use, defaults to "codellama-13b-python-hf".

  • temperature (float, optional) – Temperature to use for sampling (higher temperature means more varied outputs), defaults to 0.8.

  • max_new_tokens (int) – Maximum number of generated tokens, defaults to 512.

  • streaming (bool) – Whether to generate a stream of tokens asynchronously, defaults to False

  • top_k (int, optional) – The number of highest probability vocabulary tokens to keep for top-k-filtering.

  • top_p (float, optional) – If set to < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation, defaults to 0.95.

  • typical_p (float, optional) – Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information, defaults to 0.95.

  • repetition_penalty (float, optional) – The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.

  • return_full_text (bool) – Whether to prepend the prompt to the generated text, defaults to False

  • truncate (int, optional) – Truncate inputs tokens to the given size

  • stop_sequences (List[str], optional) – Stop generating tokens if a member of stop_sequences is generated.

  • seed (int, optional) – Random sampling seed

  • do_sample (bool) – Activate logits sampling, defaults to False.

  • watermark (bool) – Watermarking with A Watermark for Large Language Models, defaults to False

  • model_kwargs (dict, optional) – Parameters to pass to the model (see the documentation of LangChain’s HuggingFaceTextGenInference class.)

  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • authenticator (Callable, optional) – A Callable returning a JSON Web Token (JWT) for authentication.

  • jwt_url (str, optional) – URL of the Dartmouth API endpoint returning a JSON Web Token (JWT).

  • inference_server_url (str) – URL pointing to an inference endpoint, defaults to "https://ai-api.dartmouth.edu/tgi/".

  • timeout (int) – Timeout in seconds, defaults to 120

  • server_kwargs (dict, optional) – Holds any text-generation-inference server parameters not explicitly specified

  • **_ – Additional keyword arguments are silently discarded. This is to ensure interface compatibility with other langchain components.

Example

With an environment variable named DARTMOUTH_API_KEY pointing to your key obtained from https://developer.dartmouth.edu, using a Dartmouth-hosted LLM only takes a few lines of code:

from langchain_dartmouth.llms import DartmouthLLM

llm = DartmouthLLM(model_name="codellama-13b-hf")

response = llm.invoke("Write a Python script to swap two variables."")
print(response)
static list(dartmouth_api_key=None, url='https://api.dartmouth.edu/api/ai/models/')

List the models available through DartmouthLLM.

Parameters:
  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • url (str, optional) – URL of the listing server

Returns:

A list of descriptions of the available models

Return type:

list[dict]

async ainvoke(*args, **kwargs)

Asynchronously transforms a single input into an output.

See LangChain’s API documentation for details on how to use this method.

Returns:

The LLM’s completion of the input string.

Return type:

str

invoke(*args, **kwargs)

Transforms a single input into an output.

See LangChain’s API documentation for details on how to use this method.

Returns:

The LLM’s completion of the input string.

Return type:

str

class langchain_dartmouth.llms.ChatDartmouth

Dartmouth-deployed Chat models (also known as Instruct models).

Use this class if you want to use a model that uses a chat template (e.g., Llama 3.1 8B Instruct).

All prompts are automatically formatted to adhere to the chosen model’s chat template. If you need more control over the exact string sent to the model, you may want to use DartmouthLLM instead.

Parameters:
  • model_name (str) – Name of the model to use, defaults to "llama-3-1-8b-instruct".

  • streaming (bool) – Whether to stream the results or not, defaults to False.

  • temperature (float) – Temperature to use for sampling (higher temperature means more varied outputs), defaults to 0.7.

  • max_tokens (int) – Maximum number of tokens to generate, defaults to 512

  • logprobs (bool, optional) – Whether to return logprobs

  • stream_usage (bool) – Whether to include usage metadata in streaming output. If True, additional message chunks will be generated during the stream including usage metadata, defaults to False.

  • presence_penalty (float, optional) – Penalizes repeated tokens.

  • frequency_penalty (float, optional) – Penalizes repeated tokens according to frequency.

  • seed (int, optional) – Seed for generation

  • top_logprobs (int, optional) – Number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

  • logit_bias (dict, optional) – Modify the likelihood of specified tokens appearing in the completion.

  • n (int) – Number of chat completions to generate for each prompt, defaults to 1

  • top_p (float, optional) – Total probability mass of tokens to consider at each step.

  • model_kwargs (dict, optional) – Holds any model parameters valid for create call not explicitly specified.

  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • authenticator (Callable, optional) – A Callable returning a JSON Web Token (JWT) for authentication.

  • jwt_url (str, optional) – URL of the Dartmouth API endpoint returning a JSON Web Token (JWT).

  • inference_server_url (str, optional) – URL pointing to an inference endpoint, defaults to "https://ai-api.dartmouth.edu/tgi/".

  • **_ – Additional keyword arguments are silently discarded. This is to ensure interface compatibility with other langchain components.

Example

With an environment variable named DARTMOUTH_API_KEY pointing to your key obtained from https://developer.dartmouth.edu, using a Dartmouth-hosted LLM only takes a few lines of code:

from langchain_dartmouth.llms import ChatDartmouth

llm = ChatDartmouth(model_name="llama-3-8b-instruct")

response = llm.invoke("Hi there!")

print(response.content)

Note

The required prompt format is enforced automatically when you are using ChatDartmouth.

static list(dartmouth_api_key=None, url='https://api.dartmouth.edu/api/ai/models/')

List the models available through ChatDartmouth.

Parameters:
  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • url (str, optional) – URL of the listing server

Returns:

A list of descriptions of the available models

Return type:

list[dict]

async ainvoke(*args, **kwargs)

Asynchronously invokes the model to get a response to a query.

See LangChain’s API documentation for details on how to use this method.

Returns:

The LLM’s response to the prompt.

Return type:

BaseMessage

invoke(*args, **kwargs)

Invokes the model to get a response to a query.

See LangChain’s API documentation for details on how to use this method.

Returns:

The LLM’s response to the prompt.

Return type:

BaseMessage

class langchain_dartmouth.llms.ChatDartmouthCloud

Cloud chat models made available by Dartmouth.

Use this class if you want to use a model by a third-party provider, e.g., Anthropic or OpenAI, made accessible by Dartmouth.

Parameters:
  • model_name (str) – Name of the model to use, defaults to "openai.gpt-4.1-mini-2025-04-14".

  • streaming (bool) – Whether to stream the results or not, defaults to False.

  • temperature (float) – Temperature to use for sampling (higher temperature means more varied outputs), defaults to 0.7.

  • max_tokens (int) – Maximum number of tokens to generate, defaults to 512

  • logprobs (bool, optional) – Whether to return logprobs

  • stream_usage (bool) – Whether to include usage metadata in streaming output. If True, additional message chunks will be generated during the stream including usage metadata, defaults to False.

  • presence_penalty (float, optional) – Penalizes repeated tokens.

  • frequency_penalty (float, optional) – Penalizes repeated tokens according to frequency.

  • seed (int, optional) – Seed for generation

  • top_logprobs (int, optional) – Number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

  • logit_bias (dict, optional) – Modify the likelihood of specified tokens appearing in the completion.

  • n (int) – Number of chat completions to generate for each prompt, defaults to 1

  • top_p (float, optional) – Total probability mass of tokens to consider at each step.

  • model_kwargs (dict, optional) – Holds any model parameters valid for create call not explicitly specified.

  • dartmouth_chat_api_key (str, optional) – A Dartmouth Chat API key (see here for how to obtain one). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_CHAT_API_KEY.

  • inference_server_url (str, optional) – The URL of the inference server (e.g., https://chat.dartmouth.edu/api/)

  • **_ – Additional keyword arguments are silently discarded. This is to ensure interface compatibility with other langchain components.

Example

With an environment variable named DARTMOUTH_CHAT_API_KEY pointing to your key obtained from https://chat.dartmouth.edu, using a third-party LLM provided by Dartmouth only takes a few lines of code:

from langchain_dartmouth.llms import ChatDartmouthCloud

llm = ChatDartmouthCloud(model_name="openai.gpt-4o-mini-2024-07-18")

response = llm.invoke("Hi there!")

print(response.content)

Note

The models available through ChatDartmouthCloud are pay-as-you-go third-party models. Dartmouth pays for the use, but a daily token limit per user applies.

validator validate_temperature  »  all fields

Currently o models only allow temperature=1.

Parameters:

values (dict[str, Any])

Return type:

Any

static list(dartmouth_chat_api_key=None, url='https://chat.dartmouth.edu/api/')

List the models available through ChatDartmouthCloud.

Parameters:
  • dartmouth_chat_api_key (str, optional) –

    A Dartmouth Chat API key (obtainable from https://chat.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_CHAT_API_KEY.

  • url (str, optional) – URL of the listing server

Returns:

A list of descriptions of the available models

Return type:

list[dict]

async ainvoke(*args, **kwargs)

Asynchronously invokes the model to get a response to a query.

See LangChain’s API documentation for details on how to use this method.

Returns:

The LLM’s response to the prompt.

Return type:

BaseMessage

invoke(*args, **kwargs)

Invokes the model to get a response to a query.

See LangChain’s API documentation for details on how to use this method.

Returns:

The LLM’s response to the prompt.

Return type:

BaseMessage

Reranking

class langchain_dartmouth.retrievers.document_compressors.DartmouthReranker

Reranks documents using a reranking model deployed in the Dartmouth cloud.

Parameters:
  • model_name (str, optional) – The name of the embedding model to use, defaults to "bge-reranker-v2-m3".

  • top_n (int) – Number of documents to return, defaults to 3

  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • authenticator (Callable, optional) – A Callable returning a JSON Web Token (JWT) for authentication.

  • jwt_url (str, optional) – URL of the Dartmouth API endpoint returning a JSON Web Token (JWT).

  • embeddings_server_url (str, optional) – URL pointing to an embeddings endpoint, defaults to "https://ai-api.dartmouth.edu/tei/".

Example

With an environment variable named DARTMOUTH_API_KEY pointing to your key obtained from https://developer.dartmouth.edu, using a Dartmouth-hosted Reranker only takes a few lines of code:

from langchain.docstore.document import Document

from langchain_dartmouth.retrievers.document_compressors import DartmouthReranker

docs = [
    Document(page_content="Deep Learning is not..."),
    Document(page_content="Deep learning is..."),
]
query = "What is Deep Learning?"
reranker = DartmouthReranker()
ranked_docs = reranker.compress_documents(query=query, documents=docs)
print(ranked_docs)
static list(dartmouth_api_key=None, url='https://api.dartmouth.edu/api/ai/models/')

List the models available through DartmouthReranker.

Parameters:
  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • url (str, optional) – URL of the listing server

Returns:

A list of descriptions of the available models

Return type:

list[dict]

compress_documents(documents, query, callbacks=None)

Returns the most relevant documents with respect to a query.

Parameters:
  • documents (Sequence[Document]) – Documents to compress.

  • query (str) – Query to consider.

  • callbacks (Callbacks, optional) – Callbacks to run during the compression process, defaults to None

Returns:

The top_n highest-ranked documents

Return type:

Sequence[Document]