Embeddings

class langchain_dartmouth.embeddings.DartmouthEmbeddings

Embedding models deployed on Dartmouth’s cluster.

Parameters:
  • model_name (str, optional) – The name of the embedding model to use, defaults to "baai.bge-large-en-v1-5".

  • model_kwargs (dict, optional) – Keyword arguments to pass to the model.

  • dimensions (int, optional) – The number of dimensions the resulting output embeddings should have. Not supported by all models.

  • dartmouth_chat_api_key – A Dartmouth Chat API key (obtainable from https://chat.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_CHAT_API_KEY.

  • embeddings_server_url (str, optional) – URL pointing to an embeddings endpoint, defaults to "https://chat.dartmouth.edu/api/".

Example

With an environment variable named DARTMOUTH_CHAT_API_KEY pointing to your key obtained from Dartmouth Chat, using anembedding model only takes a few lines of code:

from langchain_dartmouth.embeddings import DartmouthEmbeddings

embeddings = DartmouthEmbeddings()

response = embeddings.embed_query("Hello? Is there anybody in there?")

print(response)
static list(dartmouth_chat_api_key=None, url='https://chat.dartmouth.edu/api/')

List the models available through DartmouthEmbeddings.

Parameters:
  • dartmouth_chat_api_key (str, optional) –

    A Dartmouth Chat API key (obtainable from https://chat.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_CHAT_API_KEY.

  • url (str, optional) – URL of the listing server

Returns:

A list of descriptions of the available models

Return type:

list[ModelInfo]

async aembed_documents(texts, chunk_size=None, **kwargs)

Async Call to the embedding endpoint to retrieve the embeddings of multiple texts.

Parameters:
  • text (str) – The list of texts to embed.

  • texts (List[str])

  • chunk_size (int | None)

  • kwargs (Any)

Returns:

Embeddings for the texts.

Return type:

List[List[float]]

async aembed_query(text, **kwargs)

Async Call to the embedding endpoint to retrieve the embedding of the query text.

Parameters:
  • text (str) – The text to embed.

  • kwargs (Any)

Returns:

Embeddings for the text.

Return type:

List[float]

embed_documents(texts, chunk_size=None, **kwargs)

Call out to the embedding endpoint to retrieve the embeddings of multiple texts.

Parameters:
  • text (str) – The list of texts to embed.

  • texts (List[str])

  • chunk_size (int | None)

  • kwargs (Any)

Returns:

Embeddings for the texts.

Return type:

List[List[float]]

embed_query(text, **kwargs)

Call out to the embedding endpoint to retrieve the embedding of the query text.

Parameters:
  • text (str) – The text to embed.

  • kwargs (Any)

Returns:

Embeddings for the text.

Return type:

List[float]

Large Language Models

class langchain_dartmouth.llms.DartmouthLLM

Dartmouth-deployed Large Language Models. Use this class for non-chat models (e.g., CodeLlama 13B).

This class does not format the prompt to adhere to any required templates. The string you pass to it is exactly the string received by the LLM. If the desired model requires a chat template (e.g., Llama 3.1 Instruct), you may want to use ChatDartmouth instead.

Parameters:
  • model_name (str, optional) – Name of the model to use, defaults to "codellama-13b-python-hf".

  • temperature (float, optional) – Temperature to use for sampling (higher temperature means more varied outputs), defaults to 0.8.

  • max_new_tokens (int) – Maximum number of generated tokens, defaults to 512.

  • streaming (bool) – Whether to generate a stream of tokens asynchronously, defaults to False

  • top_k (int, optional) – The number of highest probability vocabulary tokens to keep for top-k-filtering.

  • top_p (float, optional) – If set to < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation, defaults to 0.95.

  • typical_p (float, optional) – Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information, defaults to 0.95.

  • repetition_penalty (float, optional) – The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.

  • return_full_text (bool) – Whether to prepend the prompt to the generated text, defaults to False

  • truncate (int, optional) – Truncate inputs tokens to the given size

  • stop_sequences (List[str], optional) – Stop generating tokens if a member of stop_sequences is generated.

  • seed (int, optional) – Random sampling seed

  • do_sample (bool) – Activate logits sampling, defaults to False.

  • watermark (bool) – Watermarking with A Watermark for Large Language Models, defaults to False

  • model_kwargs (dict, optional) – Parameters to pass to the model (see the documentation of LangChain’s HuggingFaceTextGenInference class.)

  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • authenticator (Callable, optional) – A Callable returning a JSON Web Token (JWT) for authentication.

  • jwt_url (str, optional) – URL of the Dartmouth API endpoint returning a JSON Web Token (JWT).

  • inference_server_url (str) – URL pointing to an inference endpoint, defaults to "https://ai-api.dartmouth.edu/tgi/".

  • timeout (int) – Timeout in seconds, defaults to 120

  • server_kwargs (dict, optional) – Holds any text-generation-inference server parameters not explicitly specified

  • **_ – Additional keyword arguments are silently discarded. This is to ensure interface compatibility with other langchain components.

Example

With an environment variable named DARTMOUTH_API_KEY pointing to your key obtained from https://developer.dartmouth.edu, using a Dartmouth-hosted LLM only takes a few lines of code:

from langchain_dartmouth.llms import DartmouthLLM

llm = DartmouthLLM(model_name="codellama-13b-hf")

response = llm.invoke("Write a Python script to swap two variables."")
print(response)
static list(dartmouth_api_key=None, url='https://api.dartmouth.edu/api/ai/models/')

List the models available through DartmouthLLM.

Parameters:
  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • url (str, optional) – URL of the listing server

Returns:

A list of descriptions of the available models

Return type:

list[dict]

async ainvoke(*args, **kwargs)

Asynchronously transforms a single input into an output.

See LangChain’s API documentation for details on how to use this method.

Returns:

The LLM’s completion of the input string.

Return type:

str

invoke(*args, **kwargs)

Transforms a single input into an output.

See LangChain’s API documentation for details on how to use this method.

Returns:

The LLM’s completion of the input string.

Return type:

str

class langchain_dartmouth.llms.ChatDartmouth

Chat models made available by Dartmouth.

Use this class if you want to use any chat model, e.g., Anthropic’s Claude or OpenAI’s GPT, made accessible by Dartmouth.

Both free on-prem models, as well as paid third-party models are available.

To see which models are available, which features they support, and how much they cost, run ChatDartmouth.list().

Parameters:
  • model_name (str) – Name of the model to use, defaults to "openai.gpt-oss-120b".

  • streaming (bool) – Whether to stream the results or not, defaults to False.

  • temperature (float) – Temperature to use for sampling (higher temperature means more varied outputs), defaults to 0.7.

  • max_tokens (int) – Maximum number of tokens to generate, defaults to 512

  • logprobs (bool, optional) – Whether to return logprobs

  • stream_usage (bool) – Whether to include usage metadata in streaming output. If True, additional message chunks will be generated during the stream including usage metadata, defaults to False.

  • presence_penalty (float, optional) – Penalizes repeated tokens.

  • frequency_penalty (float, optional) – Penalizes repeated tokens according to frequency.

  • seed (int, optional) – Seed for generation

  • top_logprobs (int, optional) – Number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

  • logit_bias (dict, optional) – Modify the likelihood of specified tokens appearing in the completion.

  • n (int, optional) – Number of chat completions to generate for each prompt, defaults to None (i.e., use upstream default)

  • top_p (float, optional) – Total probability mass of tokens to consider at each step.

  • model_kwargs (dict, optional) – Holds any model parameters valid for create call not explicitly specified.

  • dartmouth_chat_api_key (str, optional) – A Dartmouth Chat API key (see here for how to obtain one). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_CHAT_API_KEY.

  • inference_server_url (str, optional) – The URL of the inference server (e.g., https://chat.dartmouth.edu/api/)

  • **_ – Additional keyword arguments are silently discarded. This is to ensure interface compatibility with other langchain components.

Example

With an environment variable named DARTMOUTH_CHAT_API_KEY pointing to your key obtained from https://chat.dartmouth.edu, using a third-party LLM provided by Dartmouth only takes a few lines of code:

from langchain_dartmouth.llms import ChatDartmouth

llm = ChatDartmouth(model_name="openai.gpt-oss-120b")

response = llm.invoke("Hi there!")

print(response.content)

Note

Paid cloud models are billed by token consumption using different pricing depending on their complexity. Dartmouth pays for the use, but a daily token limit per user applies. Your token budget is the same as in Dartmouth Chat. Learn more about credits in Dartmouth Chat’s documentation.

validator validate_temperature  »  all fields

Currently o and GPT 5 models only allow temperature=1.

Parameters:

values (dict[str, Any])

Return type:

Any

static list(dartmouth_chat_api_key=None, base_only=True, url='https://chat.dartmouth.edu/api/')

List the models available through ChatDartmouth.

Parameters:
  • dartmouth_chat_api_key (str, optional) –

    A Dartmouth Chat API key (obtainable from https://chat.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_CHAT_API_KEY.

  • base_only (bool, optional) – If True, only regular Large Language Models are returned. If False, Workspace models are also returned.

  • url (str, optional) – URL of the listing server

Returns:

A list of descriptions of the available models

Return type:

list[ModelInfo]

async ainvoke(*args, **kwargs)

Asynchronously invokes the model to get a response to a query.

See LangChain’s API documentation for details on how to use this method.

Returns:

The LLM’s response to the prompt.

Return type:

BaseMessage

invoke(*args, **kwargs)

Invokes the model to get a response to a query.

See LangChain’s API documentation for details on how to use this method.

Returns:

The LLM’s response to the prompt.

Return type:

BaseMessage

Reranking

class langchain_dartmouth.retrievers.document_compressors.DartmouthReranker

Reranks documents using a reranking model deployed in the Dartmouth cloud.

Parameters:
  • model_name (str, optional) – The name of the embedding model to use, defaults to "bge-reranker-v2-m3".

  • top_n (int) – Number of documents to return, defaults to 3

  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • authenticator (Callable, optional) – A Callable returning a JSON Web Token (JWT) for authentication.

  • jwt_url (str, optional) – URL of the Dartmouth API endpoint returning a JSON Web Token (JWT).

  • embeddings_server_url (str, optional) – URL pointing to an embeddings endpoint, defaults to "https://ai-api.dartmouth.edu/tei/".

Example

With an environment variable named DARTMOUTH_API_KEY pointing to your key obtained from https://developer.dartmouth.edu, using a Dartmouth-hosted Reranker only takes a few lines of code:

from langchain.docstore.document import Document

from langchain_dartmouth.retrievers.document_compressors import DartmouthReranker

docs = [
    Document(page_content="Deep Learning is not..."),
    Document(page_content="Deep learning is..."),
]
query = "What is Deep Learning?"
reranker = DartmouthReranker()
ranked_docs = reranker.compress_documents(query=query, documents=docs)
print(ranked_docs)
static list(dartmouth_api_key=None, url='https://api.dartmouth.edu/api/ai/models/')

List the models available through DartmouthReranker.

Parameters:
  • dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.

  • url (str, optional) – URL of the listing server

Returns:

A list of descriptions of the available models

Return type:

list[dict]

compress_documents(documents, query, callbacks=None)

Returns the most relevant documents with respect to a query.

Parameters:
  • documents (Sequence[Document]) – Documents to compress.

  • query (str) – Query to consider.

  • callbacks (Callbacks, optional) – Callbacks to run during the compression process, defaults to None

Returns:

The top_n highest-ranked documents

Return type:

Sequence[Document]

Model Listing

Model Listing and Information Classes.

This module provides classes and utilities for discovering and querying available AI models through Dartmouth’s infrastructure. It includes functionality for listing models from both on-premises and cloud-based services, along with detailed metadata about each model’s capabilities, costs, and properties.

The primary classes in this module are:

  • ModelInfo: A Pydantic model representing detailed information about a specific AI model, including its capabilities, hosting location, and cost.

  • DartmouthModelListing: Interface for listing on-premises models hosted by Dartmouth.

  • CloudModelListing: Interface for listing cloud-based models available through Dartmouth Chat.

Model listing functions are integrated into the respective model class interfaces. For example, you can call ChatDartmouth.list() or DartmouthEmbeddings.list() to discover available models for those specific use cases.

Examples

Listing available cloud models:

>>> from langchain_dartmouth.model_listing import CloudModelListing
>>> import os
>>> listing = CloudModelListing(
...     api_key=os.environ["DARTMOUTH_CHAT_API_KEY"],
...     url=os.environ["LCD_CLOUD_BASE_URL"]
... )
>>> models = listing.list(base_only=True)
>>> for model in models:
...     print(f"{model.name}: {model.description}")

Accessing model information:

>>> model = models[0]
>>> print(f"Model ID: {model.id}")
>>> print(f"Capabilities: {model.capabilities}")
>>> print(f"Cost: {model.cost}")
>>> print(f"Local hosting: {model.is_local}")

Notes

The model listing functionality requires appropriate API credentials and access to Dartmouth’s AI infrastructure. Model availability and metadata may vary based on your access level and the current deployment configuration.

class langchain_dartmouth.model_listing.ModelInfo

A class representing information about a model.

This class encapsulates metadata about language models, embedding models, and other AI models available through Dartmouth’s infrastructure. It provides a structured way to access model properties, capabilities, and configuration details.

The class automatically processes and validates model metadata from API responses, extracting relevant information from nested structures and tags.

id

Unique identifier used to access the model in API calls.

Type:

str

name

Human-readable name of the model for display purposes.

Type:

str | None

description

Detailed description of the model’s purpose and characteristics, as displayed in Dartmouth Chat interface.

Type:

str | None

is_embedding

Flag indicating whether this model can be used for generating embeddings.

Type:

bool | None

capabilities

List of model capabilities such as ‘vision’, ‘tool calling’, ‘reasoning’, etc.

Type:

list[str] | None

is_local

Indicates model hosting location: - True: Model is hosted on-premises by Dartmouth - False: Model is hosted off-premises by a third-party provider

Type:

bool | None

cost

Relative cost indicator for model usage: - “free”: No cost for usage - “$” to “$$$$”: Increasing cost levels - “undefined”: Cost information not available

Type:

Literal[“undefined”, “free”, “$”, “$$”, “$$$”, “$$$$”] | None

Examples

>>> model = ModelInfo(
...     id="gpt-4",
...     name="GPT-4",
...     description="Advanced language model",
...     is_embedding=False,
...     capabilities=["vision", "tool calling"],
...     is_local=False,
...     cost="$$$"
... )
>>> print(model.id)
gpt-4
>>> print(model.capabilities)
['vision', 'tool calling']
class langchain_dartmouth.model_listing.DartmouthModelListing(api_key, url)

Interface for listing on-premises models hosted by Dartmouth.

This class provides access to models hosted on Dartmouth’s on-premises infrastructure. It handles authentication using Dartmouth API keys and supports filtering models by various criteria.

Parameters:
  • api_key (str) – Dartmouth API key for authentication.

  • url (str) – Base URL of the Dartmouth model listing API.

Examples

>>> from langchain_dartmouth.model_listing import DartmouthModelListing
>>> import os
>>> listing = DartmouthModelListing(
...     api_key=os.environ["DARTMOUTH_API_KEY"],
...     url="https://api.dartmouth.edu/models/"
... )
>>> models = listing.list(type="llm")
>>> for model in models:
...     print(model["id"])

Notes

Authentication is performed using JWT tokens obtained via the dartmouth_auth package. The token is automatically refreshed if a request fails due to authentication issues.

list(**kwargs)

Get a list of available on-premises models.

Retrieves models from Dartmouth’s on-premises infrastructure, with optional filtering by server, type, or capabilities.

Parameters:

**kwargs (dict) –

Optional filtering parameters:

  • serverstr

    Filter by specific server name

  • typestr

    Filter by model type (e.g., “llm”, “embedding”)

  • capabilitiesstr or list[str]

    Filter by model capabilities

Returns:

List of model descriptions as dictionaries. Each dictionary contains model metadata including id, name, capabilities, etc.

Return type:

List[dict]

Raises:

requests.HTTPError – If the API request fails after retry with re-authentication.

Examples

List all models:

>>> models = listing.list()

Filter by model type:

>>> llm_models = listing.list(type="llm")

Filter by capabilities:

>>> vision_models = listing.list(capabilities="vision")

Notes

If the initial request fails, the method automatically attempts to re-authenticate and retry the request once.

class langchain_dartmouth.model_listing.CloudModelListing(api_key, url)

Interface for listing cloud-based models available through Dartmouth Chat.

This class provides access to models available through Dartmouth’s cloud infrastructure, including both base models and customized/fine-tuned variants. It returns structured ModelInfo objects with detailed metadata.

Parameters:
  • api_key (str) – API key for Dartmouth Chat authentication.

  • url (str) – Base URL of the cloud model listing API.

Examples

>>> from langchain_dartmouth.model_listing import CloudModelListing
>>> import os
>>> listing = CloudModelListing(
...     api_key=os.environ["DARTMOUTH_CHAT_API_KEY"],
...     url=os.environ["LCD_CLOUD_BASE_URL"]
... )
>>> models = listing.list(base_only=True)
>>> for model in models:
...     print(f"{model.name} - Cost: {model.cost}")

Notes

Cloud models are accessed through bearer token authentication. The returned ModelInfo objects provide structured access to model metadata including capabilities, costs, and hosting details.

list(base_only=False)

Get a list of available cloud models.

Retrieves models from Dartmouth’s cloud infrastructure, with the option to filter for only base models or include customized variants.

Parameters:

base_only (bool, optional) – If True, return only base models. If False (default), return both base models and customized/fine-tuned variants.

Returns:

List of ModelInfo objects containing detailed metadata about each available model. Only active models are included.

Return type:

List[ModelInfo]

Raises:

requests.HTTPError – If the API request fails.

Examples

List all available models:

>>> all_models = listing.list()

List only base models:

>>> base_models = listing.list(base_only=True)

Access model details:

>>> for model in base_models:
...     if "vision" in model.capabilities:
...         print(f"{model.name} supports vision")

Notes

The method automatically filters out inactive models from the results. Model metadata is validated and structured using the ModelInfo Pydantic model, which extracts capabilities, costs, and other properties from the API response.

Base URLs

Configuration definitions for langchain_dartmouth.

This module contains base URLs and configuration constants used throughout the langchain_dartmouth library. All URLs can be overridden via environment variables.

langchain_dartmouth.definitions.EMBEDDINGS_BASE_URL = 'https://api.dartmouth.edu/api/ai/tei/'

Base URL for the embeddings API endpoint.

Can be overridden by setting the LCD_EMBEDDINGS_BASE_URL environment variable. Defaults to https://api.dartmouth.edu/api/ai/tei/.

Type:

str

langchain_dartmouth.definitions.RERANK_BASE_URL = 'https://api.dartmouth.edu/api/ai/tei/'

Base URL for the reranking API endpoint.

Can be overridden by setting the LCD_RERANK_BASE_URL environment variable. Defaults to https://api.dartmouth.edu/api/ai/tei/.

Type:

str

langchain_dartmouth.definitions.LLM_BASE_URL = 'https://api.dartmouth.edu/api/ai/tgi/'

Base URL for the Large Language Model API endpoint.

Can be overridden by setting the LCD_LLM_BASE_URL environment variable. Defaults to https://api.dartmouth.edu/api/ai/tgi/.

Type:

str

langchain_dartmouth.definitions.CLOUD_BASE_URL = 'https://chat.dartmouth.edu/api/'

Base URL for the Dartmouth Chat API endpoint.

Can be overridden by setting the LCD_CLOUD_BASE_URL environment variable. Defaults to https://chat.dartmouth.edu/api/.

Type:

str

langchain_dartmouth.definitions.MODEL_LISTING_BASE_URL = 'https://api.dartmouth.edu/api/ai/models/'

Base URL for the model listings API endpoint.

Can be overridden by setting the LCD_MODEL_LISTINGS_BASE_URL environment variable. Defaults to https://api.dartmouth.edu/api/ai/models/.

Type:

str