Embeddings
- class langchain_dartmouth.embeddings.DartmouthEmbeddings
Embedding models deployed on Dartmouth’s cluster.
- Parameters:
model_name (str, optional) – The name of the embedding model to use, defaults to
"bge-large-en-v1-5"
.model_kwargs (dict, optional) – Keyword arguments to pass to the model.
dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable
DARTMOUTH_API_KEY
.authenticator (Callable, optional) – A Callable returning a JSON Web Token (JWT) for authentication. Only needed for special use cases.
jwt_url (str, optional) – URL of the Dartmouth API endpoint returning a JSON Web Token (JWT).
embeddings_server_url (str, optional) – URL pointing to an embeddings endpoint, defaults to
"https://ai-api.dartmouth.edu/tei/"
.
Example
With an environment variable named
DARTMOUTH_API_KEY
pointing to your key obtained from https://developer.dartmouth.edu, using a Dartmouth-hosted embedding model only takes a few lines of code:from langchain_dartmouth.embeddings import DartmouthEmbeddings embeddings = DartmouthEmbeddings() response = embeddings.embed_query("Hello? Is there anybody in there?") print(response)
- static list(dartmouth_api_key=None, url='https://api.dartmouth.edu/api/ai/models/')
List the models available through
DartmouthEmbeddings
.- Parameters:
dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable
DARTMOUTH_API_KEY
.url (str, optional) – URL of the listing server
- Returns:
A list of descriptions of the available models
- Return type:
list[dict]
- async aembed_documents(texts)
Async Call to the embedding endpoint to retrieve the embeddings of multiple texts.
- Parameters:
text (str) – The list of texts to embed.
texts (List[str])
- Returns:
Embeddings for the texts.
- Return type:
List[List[float]]
- async aembed_query(text)
Async Call to the embedding endpoint to retrieve the embedding of the query text.
- Parameters:
text (str) – The text to embed.
- Returns:
Embeddings for the text.
- Return type:
List[float]
- embed_documents(texts)
Call out to the embedding endpoint to retrieve the embeddings of multiple texts.
- Parameters:
text (str) – The list of texts to embed.
texts (List[str])
- Returns:
Embeddings for the texts.
- Return type:
List[List[float]]
- embed_query(text)
Call out to the embedding endpoint to retrieve the embedding of the query text.
- Parameters:
text (str) – The text to embed.
- Returns:
Embeddings for the text.
- Return type:
List[float]
Large Language Models
- class langchain_dartmouth.llms.DartmouthLLM
Dartmouth-deployed Large Language Models. Use this class for non-chat models (e.g., CodeLlama 13B).
This class does not format the prompt to adhere to any required templates. The string you pass to it is exactly the string received by the LLM. If the desired model requires a chat template (e.g., Llama 3.1 Instruct), you may want to use
ChatDartmouth
instead.- Parameters:
model_name (str, optional) – Name of the model to use, defaults to
"codellama-13b-python-hf"
.temperature (float, optional) – Temperature to use for sampling (higher temperature means more varied outputs), defaults to
0.8
.max_new_tokens (int) – Maximum number of generated tokens, defaults to
512
.streaming (bool) – Whether to generate a stream of tokens asynchronously, defaults to
False
top_k (int, optional) – The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_p (float, optional) – If set to < 1, only the smallest set of most probable tokens with probabilities that add up to
top_p
or higher are kept for generation, defaults to0.95
.typical_p (float, optional) – Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information, defaults to
0.95
.repetition_penalty (float, optional) – The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.
return_full_text (bool) – Whether to prepend the prompt to the generated text, defaults to
False
truncate (int, optional) – Truncate inputs tokens to the given size
stop_sequences (List[str], optional) – Stop generating tokens if a member of
stop_sequences
is generated.seed (int, optional) – Random sampling seed
do_sample (bool) – Activate logits sampling, defaults to
False
.watermark (bool) – Watermarking with A Watermark for Large Language Models, defaults to
False
model_kwargs (dict, optional) – Parameters to pass to the model (see the documentation of LangChain’s HuggingFaceTextGenInference class.)
dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.
authenticator (Callable, optional) – A Callable returning a JSON Web Token (JWT) for authentication.
jwt_url (str, optional) – URL of the Dartmouth API endpoint returning a JSON Web Token (JWT).
inference_server_url (str) – URL pointing to an inference endpoint, defaults to
"https://ai-api.dartmouth.edu/tgi/"
.timeout (int) – Timeout in seconds, defaults to
120
server_kwargs (dict, optional) – Holds any text-generation-inference server parameters not explicitly specified
**_ – Additional keyword arguments are silently discarded. This is to ensure interface compatibility with other langchain components.
Example
With an environment variable named
DARTMOUTH_API_KEY
pointing to your key obtained from https://developer.dartmouth.edu, using a Dartmouth-hosted LLM only takes a few lines of code:from langchain_dartmouth.llms import DartmouthLLM llm = DartmouthLLM(model_name="codellama-13b-hf") response = llm.invoke("Write a Python script to swap two variables."") print(response)
- static list(dartmouth_api_key=None, url='https://api.dartmouth.edu/api/ai/models/')
List the models available through
DartmouthLLM
.- Parameters:
dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable
DARTMOUTH_API_KEY
.url (str, optional) – URL of the listing server
- Returns:
A list of descriptions of the available models
- Return type:
list[dict]
- async ainvoke(*args, **kwargs)
Asynchronously transforms a single input into an output.
See LangChain’s API documentation for details on how to use this method.
- Returns:
The LLM’s completion of the input string.
- Return type:
str
- invoke(*args, **kwargs)
Transforms a single input into an output.
See LangChain’s API documentation for details on how to use this method.
- Returns:
The LLM’s completion of the input string.
- Return type:
str
- class langchain_dartmouth.llms.ChatDartmouth
Dartmouth-deployed Chat models (also known as Instruct models).
Use this class if you want to use a model that uses a chat template (e.g., Llama 3.1 8B Instruct).
All prompts are automatically formatted to adhere to the chosen model’s chat template. If you need more control over the exact string sent to the model, you may want to use
DartmouthLLM
instead.- Parameters:
model_name (str) – Name of the model to use, defaults to
"llama-3-1-8b-instruct"
.streaming (bool) – Whether to stream the results or not, defaults to
False
.temperature (float) – Temperature to use for sampling (higher temperature means more varied outputs), defaults to
0.7
.max_tokens (int) – Maximum number of tokens to generate, defaults to 512
logprobs (bool, optional) – Whether to return logprobs
stream_usage (bool) – Whether to include usage metadata in streaming output. If
True
, additional message chunks will be generated during the stream including usage metadata, defaults toFalse
.presence_penalty (float, optional) – Penalizes repeated tokens.
frequency_penalty (float, optional) – Penalizes repeated tokens according to frequency.
seed (int, optional) – Seed for generation
top_logprobs (int, optional) – Number of most likely tokens to return at each token position, each with an associated log probability.
logprobs
must be set to true if this parameter is used.logit_bias (dict, optional) – Modify the likelihood of specified tokens appearing in the completion.
n (int) – Number of chat completions to generate for each prompt, defaults to
1
top_p (float, optional) – Total probability mass of tokens to consider at each step.
model_kwargs (dict, optional) – Holds any model parameters valid for
create
call not explicitly specified.dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable DARTMOUTH_API_KEY.
authenticator (Callable, optional) – A Callable returning a JSON Web Token (JWT) for authentication.
jwt_url (str, optional) – URL of the Dartmouth API endpoint returning a JSON Web Token (JWT).
inference_server_url (str, optional) – URL pointing to an inference endpoint, defaults to
"https://ai-api.dartmouth.edu/tgi/"
.**_ – Additional keyword arguments are silently discarded. This is to ensure interface compatibility with other langchain components.
Example
With an environment variable named
DARTMOUTH_API_KEY
pointing to your key obtained from https://developer.dartmouth.edu, using a Dartmouth-hosted LLM only takes a few lines of code:from langchain_dartmouth.llms import ChatDartmouth llm = ChatDartmouth(model_name="llama-3-8b-instruct") response = llm.invoke("Hi there!") print(response.content)
Note
The required prompt format is enforced automatically when you are using
ChatDartmouth
.- static list(dartmouth_api_key=None, url='https://api.dartmouth.edu/api/ai/models/')
List the models available through
ChatDartmouth
.- Parameters:
dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable
DARTMOUTH_API_KEY
.url (str, optional) – URL of the listing server
- Returns:
A list of descriptions of the available models
- Return type:
list[dict]
- async ainvoke(*args, **kwargs)
Asynchronously invokes the model to get a response to a query.
See LangChain’s API documentation for details on how to use this method.
- Returns:
The LLM’s response to the prompt.
- Return type:
BaseMessage
- invoke(*args, **kwargs)
Invokes the model to get a response to a query.
See LangChain’s API documentation for details on how to use this method.
- Returns:
The LLM’s response to the prompt.
- Return type:
BaseMessage
- class langchain_dartmouth.llms.ChatDartmouthCloud
Cloud chat models made available by Dartmouth.
Use this class if you want to use a model by a third-party provider, e.g., Anthropic or OpenAI, made accessible by Dartmouth.
- Parameters:
model_name (str) – Name of the model to use, defaults to
"openai.gpt-4.1-mini-2025-04-14"
.streaming (bool) – Whether to stream the results or not, defaults to
False
.temperature (float) – Temperature to use for sampling (higher temperature means more varied outputs), defaults to
0.7
.max_tokens (int) – Maximum number of tokens to generate, defaults to 512
logprobs (bool, optional) – Whether to return logprobs
stream_usage (bool) – Whether to include usage metadata in streaming output. If
True
, additional message chunks will be generated during the stream including usage metadata, defaults toFalse
.presence_penalty (float, optional) – Penalizes repeated tokens.
frequency_penalty (float, optional) – Penalizes repeated tokens according to frequency.
seed (int, optional) – Seed for generation
top_logprobs (int, optional) – Number of most likely tokens to return at each token position, each with an associated log probability.
logprobs
must be set to true if this parameter is used.logit_bias (dict, optional) – Modify the likelihood of specified tokens appearing in the completion.
n (int) – Number of chat completions to generate for each prompt, defaults to
1
top_p (float, optional) – Total probability mass of tokens to consider at each step.
model_kwargs (dict, optional) – Holds any model parameters valid for
create
call not explicitly specified.dartmouth_chat_api_key (str, optional) – A Dartmouth Chat API key (see here for how to obtain one). If not specified, it is attempted to be inferred from an environment variable
DARTMOUTH_CHAT_API_KEY
.inference_server_url (str, optional) – The URL of the inference server (e.g., https://chat.dartmouth.edu/api/)
**_ – Additional keyword arguments are silently discarded. This is to ensure interface compatibility with other langchain components.
Example
With an environment variable named
DARTMOUTH_CHAT_API_KEY
pointing to your key obtained from https://chat.dartmouth.edu, using a third-party LLM provided by Dartmouth only takes a few lines of code:from langchain_dartmouth.llms import ChatDartmouthCloud llm = ChatDartmouthCloud(model_name="openai.gpt-4o-mini-2024-07-18") response = llm.invoke("Hi there!") print(response.content)
Note
The models available through
ChatDartmouthCloud
are pay-as-you-go third-party models. Dartmouth pays for the use, but a daily token limit per user applies.- validator validate_temperature » all fields
Currently o models only allow temperature=1.
- Parameters:
values (dict[str, Any])
- Return type:
Any
- static list(dartmouth_chat_api_key=None, url='https://chat.dartmouth.edu/api/')
List the models available through
ChatDartmouthCloud
.- Parameters:
dartmouth_chat_api_key (str, optional) –
A Dartmouth Chat API key (obtainable from https://chat.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable
DARTMOUTH_CHAT_API_KEY
.url (str, optional) – URL of the listing server
- Returns:
A list of descriptions of the available models
- Return type:
list[dict]
- async ainvoke(*args, **kwargs)
Asynchronously invokes the model to get a response to a query.
See LangChain’s API documentation for details on how to use this method.
- Returns:
The LLM’s response to the prompt.
- Return type:
BaseMessage
- invoke(*args, **kwargs)
Invokes the model to get a response to a query.
See LangChain’s API documentation for details on how to use this method.
- Returns:
The LLM’s response to the prompt.
- Return type:
BaseMessage
Reranking
- class langchain_dartmouth.retrievers.document_compressors.DartmouthReranker
Reranks documents using a reranking model deployed in the Dartmouth cloud.
- Parameters:
model_name (str, optional) – The name of the embedding model to use, defaults to
"bge-reranker-v2-m3"
.top_n (int) – Number of documents to return, defaults to
3
dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable
DARTMOUTH_API_KEY
.authenticator (Callable, optional) – A Callable returning a JSON Web Token (JWT) for authentication.
jwt_url (str, optional) – URL of the Dartmouth API endpoint returning a JSON Web Token (JWT).
embeddings_server_url (str, optional) – URL pointing to an embeddings endpoint, defaults to
"https://ai-api.dartmouth.edu/tei/"
.
Example
With an environment variable named
DARTMOUTH_API_KEY
pointing to your key obtained from https://developer.dartmouth.edu, using a Dartmouth-hosted Reranker only takes a few lines of code:from langchain.docstore.document import Document from langchain_dartmouth.retrievers.document_compressors import DartmouthReranker docs = [ Document(page_content="Deep Learning is not..."), Document(page_content="Deep learning is..."), ] query = "What is Deep Learning?" reranker = DartmouthReranker() ranked_docs = reranker.compress_documents(query=query, documents=docs) print(ranked_docs)
- static list(dartmouth_api_key=None, url='https://api.dartmouth.edu/api/ai/models/')
List the models available through
DartmouthReranker
.- Parameters:
dartmouth_api_key (str, optional) – A Dartmouth API key (obtainable from https://developer.dartmouth.edu). If not specified, it is attempted to be inferred from an environment variable
DARTMOUTH_API_KEY
.url (str, optional) – URL of the listing server
- Returns:
A list of descriptions of the available models
- Return type:
list[dict]
- compress_documents(documents, query, callbacks=None)
Returns the most relevant documents with respect to a query.
- Parameters:
documents (Sequence[Document]) – Documents to compress.
query (str) – Query to consider.
callbacks (Callbacks, optional) – Callbacks to run during the compression process, defaults to
None
- Returns:
The
top_n
highest-ranked documents- Return type:
Sequence[Document]