curategpt.utils package
Submodules
curategpt.utils.eval_utils module
Evaluation utilities.
- class curategpt.utils.eval_utils.Outcome(**data)
Bases:
BaseModel- append_outcomes(outcomes)
- Return type:
None
-
by_field:
Dict[str,int]
- calculate_metrics()
-
expected:
Union[Dict[str,Any],List[Dict[str,Any]]]
-
f1:
Optional[float]
- flatten()
- Return type:
Dict[str,Any]
-
fn:
int
-
fp:
int
-
ixn_by_field:
Dict[str,List[str]]
- model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
parameters:
Dict[str,Any]
-
precision:
Optional[float]
-
prediction:
Union[Dict[str,Any],List[Dict[str,Any]]]
-
recall:
Optional[float]
-
tn:
int
-
tp:
int
- curategpt.utils.eval_utils.best_matches(pred_rels, exp_rels)
Find the best matching pairs of relationships.
Example:
- Return type:
List[Outcome]
>>> outcomes = best_matches([], []) >>> len(outcomes) 1 >>> outcome = outcomes[0] >>> (outcome.tp, outcome.fp, outcome.fn) (0, 0, 0) >>> best_matches([{"x:": 1}], [])[0].precision 0.0 >>> outcome = best_matches([{"x": 1}], [{"x": 1}])[0] >>> outcome.precision 1.0 >>> outcome = best_matches([{"x": 1}], [{"y": 1}])[0] >>> outcome.precision 0.0 >>> pred_rels = [{"x":1}, {"y": 2}, {"z": 3}] >>> exp_rels = [{"y":2}, {"x": 1}, {"z": 3}] >>> outcomes = best_matches(pred_rels, exp_rels) >>> [o.precision for o in outcomes] [1.0, 1.0, 1.0] >>> exp_rels.append({"z": 4}) >>> outcomes = best_matches(pred_rels, exp_rels) >>> sorted([o.precision for o in outcomes]) [0.0, 1.0, 1.0, 1.0]
- curategpt.utils.eval_utils.score_prediction(predicted, expected, exclude=None)
Score the predicted activity.
>>> outcome = score_prediction({"x": 1}, {"x": 1}) >>> outcome.tp 1
>>> outcome = score_prediction([{"x": 1}], {"x": 1}) >>> outcome.tp 1
>>> outcome = score_prediction({"x": 1}, {"x": 2}) >>> outcome.tp 0 >>> outcome.recall 0.0
>>> outcome = score_prediction({"x": 1, "y": 2}, {"x": 1}) >>> outcome.tp 1 >>> outcome.fp 1
>>> outcome = score_prediction([{"x": 1}, {"y": 1}], {"x": 1}) >>> outcome.tp 1 >>> outcome.fp 1
- Parameters:
predicted (
Union[Dict,List]) – The predicted activityexpected (
Union[Dict,List]) – The expected activity
- Return type:
- Returns:
The score
curategpt.utils.llm_utils module
Utilities for interacting with LLM APIs.
- curategpt.utils.llm_utils.is_rate_limit_error(exception)
- curategpt.utils.llm_utils.query_model(model, *args, **kwargs)
- Return type:
Response
curategpt.utils.patch_utils module
Utilities for converting patch dictionaries to OAK commands.
- curategpt.utils.patch_utils.patches_to_oak_commands(patch_dict, ont_path)
curategpt.utils.search module
curategpt.utils.tokens module
- curategpt.utils.tokens.estimate_num_tokens(messages, model='gpt-4')
Return the number of tokens used by a list of messages.
Note: this is an estimate
- curategpt.utils.tokens.max_tokens_by_model(model_id=None)
Return the maximum number of tokens allowed by a model.
TODO: return precise values, currently an estimate.
curategpt.utils.vector_algorithms module
- curategpt.utils.vector_algorithms.compute_cosine_similarity(list1, list2)
Compute cosine similarity between two lists of vectors.
Result is a two column vector sim[ROW][COL] where ROW is from list1 and COL is from list2.
- Parameters:
list1 (
List[List[float]])list2 (
List[List[float]])
- Return type:
ndarray- Returns:
- curategpt.utils.vector_algorithms.mmr_diversified_search(query_vector, document_vectors, relevance_factor=0.5, top_n=None)
Perform diversified search using Maximal Marginal Relevance (MMR).
- Return type:
List[int]
Parameters
query_vector: The vector representing the query.
document_vectors: The vectors representing the documents.
lambda_: Balance parameter between relevance and diversity.
top_n: Number of results to return. If None, return all.
Returns
List of indices representing the diversified order of documents.
- curategpt.utils.vector_algorithms.top_matches(cosine_similarity_matrix)
Find the top match for each row in the cosine similarity matrix.
- Parameters:
cosine_similarity_matrix (
ndarray)- Return type:
Tuple[ndarray,ndarray]- Returns:
- curategpt.utils.vector_algorithms.top_n_matches(cosine_similarity_matrix, n=10)
- Return type:
Tuple[ndarray,ndarray]
curategpt.utils.vectordb_operations module
- curategpt.utils.vectordb_operations.match_collections(db, left_collection, right_collection, other_db=None)
Match every element in left collection with every element in right collection.
Currently this returns best matches for left collection only