curategpt.utils package
Submodules
curategpt.utils.eval_utils module
Evaluation utilities.
- class curategpt.utils.eval_utils.Outcome(**data)
Bases:
BaseModel
- append_outcomes(outcomes)
- Return type:
None
-
by_field:
Dict
[str
,int
]
- calculate_metrics()
-
expected:
Union
[Dict
[str
,Any
],List
[Dict
[str
,Any
]]]
-
f1:
Optional
[float
]
- flatten()
- Return type:
Dict
[str
,Any
]
-
fn:
int
-
fp:
int
-
ixn_by_field:
Dict
[str
,List
[str
]]
- model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
parameters:
Dict
[str
,Any
]
-
precision:
Optional
[float
]
-
prediction:
Union
[Dict
[str
,Any
],List
[Dict
[str
,Any
]]]
-
recall:
Optional
[float
]
-
tn:
int
-
tp:
int
- curategpt.utils.eval_utils.best_matches(pred_rels, exp_rels)
Find the best matching pairs of relationships.
Example:
>>> outcomes = best_matches([], []) >>> len(outcomes) 1 >>> outcome = outcomes[0] >>> (outcome.tp, outcome.fp, outcome.fn) (0, 0, 0) >>> best_matches([{"x:": 1}], [])[0].precision 0.0 >>> outcome = best_matches([{"x": 1}], [{"x": 1}])[0] >>> outcome.precision 1.0 >>> outcome = best_matches([{"x": 1}], [{"y": 1}])[0] >>> outcome.precision 0.0 >>> pred_rels = [{"x":1}, {"y": 2}, {"z": 3}] >>> exp_rels = [{"y":2}, {"x": 1}, {"z": 3}] >>> outcomes = best_matches(pred_rels, exp_rels) >>> [o.precision for o in outcomes] [1.0, 1.0, 1.0] >>> exp_rels.append({"z": 4}) >>> outcomes = best_matches(pred_rels, exp_rels) :rtype: :sphinx_autodoc_typehints_type:`\:py\:class\:\`\~typing.List\`\\ \\\[\:py\:class\:\`\~curategpt.utils.eval\_utils.Outcome\`\]`
>>> sorted([o.precision for o in outcomes]) [0.0, 1.0, 1.0, 1.0]
- curategpt.utils.eval_utils.score_prediction(predicted, expected, exclude=None)
Score the predicted activity.
>>> outcome = score_prediction({"x": 1}, {"x": 1}) >>> outcome.tp 1
>>> outcome = score_prediction([{"x": 1}], {"x": 1}) >>> outcome.tp 1
>>> outcome = score_prediction({"x": 1}, {"x": 2}) >>> outcome.tp 0 >>> outcome.recall 0.0
>>> outcome = score_prediction({"x": 1, "y": 2}, {"x": 1}) >>> outcome.tp 1 >>> outcome.fp 1
>>> outcome = score_prediction([{"x": 1}, {"y": 1}], {"x": 1}) >>> outcome.tp 1 >>> outcome.fp 1
- Parameters:
predicted (
Union
[Dict
,List
]) – The predicted activityexpected (
Union
[Dict
,List
]) – The expected activity
- Return type:
- Returns:
The score
curategpt.utils.llm_utils module
Utilities for interacting with LLM APIs.
- curategpt.utils.llm_utils.is_rate_limit_error(exception)
- curategpt.utils.llm_utils.query_model(model, *args, **kwargs)
- Return type:
Response
curategpt.utils.patch_utils module
- curategpt.utils.patch_utils.patches_to_oak_commands(patch_dict, ont_path)
- Return type:
str
curategpt.utils.search module
curategpt.utils.tokens module
- curategpt.utils.tokens.estimate_num_tokens(messages, model='gpt-4')
Return the number of tokens used by a list of messages.
Note: this is an estimate
- curategpt.utils.tokens.max_tokens_by_model(model_id=None)
Return the maximum number of tokens allowed by a model.
TODO: return precise values, currently an estimate.
curategpt.utils.vector_algorithms module
- curategpt.utils.vector_algorithms.compute_cosine_similarity(list1, list2)
Compute cosine similarity between two lists of vectors.
Result is a two column vector sim[ROW][COL] where ROW is from list1 and COL is from list2.
- Parameters:
list1 (
List
[List
[float
]])list2 (
List
[List
[float
]])
- Return type:
ndarray
- Returns:
- curategpt.utils.vector_algorithms.mmr_diversified_search(query_vector, document_vectors, relevance_factor=0.5, top_n=None)
Perform diversified search using Maximal Marginal Relevance (MMR).
- Return type:
List
[int
]
Parameters
query_vector: The vector representing the query.
document_vectors: The vectors representing the documents.
lambda_: Balance parameter between relevance and diversity.
top_n: Number of results to return. If None, return all.
Returns
List of indices representing the diversified order of documents.
- curategpt.utils.vector_algorithms.top_matches(cosine_similarity_matrix)
Find the top match for each row in the cosine similarity matrix.
- Parameters:
cosine_similarity_matrix (
ndarray
)- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
- curategpt.utils.vector_algorithms.top_n_matches(cosine_similarity_matrix, n=10)
- Return type:
Tuple
[ndarray
,ndarray
]
curategpt.utils.vectordb_operations module
- curategpt.utils.vectordb_operations.match_collections(db, left_collection, right_collection, other_db=None)
Match every element in left collection with every element in right collection.
Currently this returns best matches for left collection only