curate_gpt.utils package

Submodules

curate_gpt.utils.eval_utils module

Evaluation utilities.

class curate_gpt.utils.eval_utils.Outcome(**data)

Bases: BaseModel

append_outcomes(outcomes)
Return type:

None

by_field: Dict[str, int]
calculate_metrics()
expected: Union[Dict[str, Any], List[Dict[str, Any]]]
f1: Optional[float]
flatten()
Return type:

Dict[str, Any]

fn: int
fp: int
ixn_by_field: Dict[str, List[str]]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'by_field': FieldInfo(annotation=Dict[str, int], required=False, default={}), 'expected': FieldInfo(annotation=Union[Dict[str, Any], List[Dict[str, Any]]], required=False, default={}), 'f1': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'fn': FieldInfo(annotation=int, required=False, default=0), 'fp': FieldInfo(annotation=int, required=False, default=0), 'ixn_by_field': FieldInfo(annotation=Dict[str, List[str]], required=False, default={}), 'parameters': FieldInfo(annotation=Dict[str, Any], required=False, default={}), 'precision': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'prediction': FieldInfo(annotation=Union[Dict[str, Any], List[Dict[str, Any]]], required=False, default={}), 'recall': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'tn': FieldInfo(annotation=int, required=False, default=0), 'tp': FieldInfo(annotation=int, required=False, default=0)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

parameters: Dict[str, Any]
precision: Optional[float]
prediction: Union[Dict[str, Any], List[Dict[str, Any]]]
recall: Optional[float]
tn: int
tp: int
curate_gpt.utils.eval_utils.best_matches(pred_rels, exp_rels)

Find the best matching pairs of relationships.

Example:

>>> outcomes = best_matches([], [])
>>> len(outcomes)
1
>>> outcome = outcomes[0]
>>> (outcome.tp, outcome.fp, outcome.fn)
(0, 0, 0)
>>> best_matches([{"x:": 1}], [])[0].precision
0.0
>>> outcome = best_matches([{"x": 1}], [{"x": 1}])[0]
>>> outcome.precision
1.0
>>> outcome = best_matches([{"x": 1}], [{"y": 1}])[0]
>>> outcome.precision
0.0
>>> pred_rels = [{"x":1}, {"y": 2}, {"z": 3}]
>>> exp_rels = [{"y":2}, {"x": 1}, {"z": 3}]
>>> outcomes = best_matches(pred_rels, exp_rels)
>>> [o.precision for o in outcomes]
[1.0, 1.0, 1.0]
>>> exp_rels.append({"z": 4})
>>> outcomes = best_matches(pred_rels, exp_rels)
:rtype: :sphinx_autodoc_typehints_type:`\:py\:class\:\`\~typing.List\`\\ \\\[\:py\:class\:\`\~curate\_gpt.utils.eval\_utils.Outcome\`\]`
>>> sorted([o.precision for o in outcomes])
[0.0, 1.0, 1.0, 1.0]
curate_gpt.utils.eval_utils.score_prediction(predicted, expected, exclude=None)

Score the predicted activity.

>>> outcome = score_prediction({"x": 1}, {"x": 1})
>>> outcome.tp
1
>>> outcome = score_prediction([{"x": 1}], {"x": 1})
>>> outcome.tp
1
>>> outcome = score_prediction({"x": 1}, {"x": 2})
>>> outcome.tp
0
>>> outcome.recall
0.0
>>> outcome = score_prediction({"x": 1, "y": 2}, {"x": 1})
>>> outcome.tp
1
>>> outcome.fp
1
>>> outcome = score_prediction([{"x": 1}, {"y": 1}], {"x": 1})
>>> outcome.tp
1
>>> outcome.fp
1
Parameters:
  • predicted (Union[Dict, List]) – The predicted activity

  • expected (Union[Dict, List]) – The expected activity

Return type:

Outcome

Returns:

The score

curate_gpt.utils.llm_utils module

curate_gpt.utils.llm_utils.is_rate_limit_error(exception)
curate_gpt.utils.llm_utils.query_model(model, *args, **kwargs)
Return type:

Response

curate_gpt.utils.patch_utils module

curate_gpt.utils.patch_utils.patches_to_oak_commands(patch_dict, ont_path)
Return type:

str

curate_gpt.utils.search module

curate_gpt.utils.tokens module

curate_gpt.utils.tokens.estimate_num_tokens(messages, model='gpt-4')

Return the number of tokens used by a list of messages.

Note: this is an estimate

curate_gpt.utils.tokens.max_tokens_by_model(model_id=None)

Return the maximum number of tokens allowed by a model.

TODO: return precise values, currently an estimate.

curate_gpt.utils.vector_algorithms module

curate_gpt.utils.vector_algorithms.compute_cosine_similarity(list1, list2)

Compute cosine similarity between two lists of vectors.

Result is a two column vector sim[ROW][COL] where ROW is from list1 and COL is from list2.

Parameters:
  • list1 (List[List[float]])

  • list2 (List[List[float]])

Return type:

ndarray

Returns:

Perform diversified search using Maximal Marginal Relevance (MMR).

Return type:

List[int]

Parameters

  • query_vector: The vector representing the query.

  • document_vectors: The vectors representing the documents.

  • lambda_: Balance parameter between relevance and diversity.

  • top_n: Number of results to return. If None, return all.

Returns

  • List of indices representing the diversified order of documents.

curate_gpt.utils.vector_algorithms.top_matches(cosine_similarity_matrix)

Find the top match for each row in the cosine similarity matrix.

Parameters:

cosine_similarity_matrix (ndarray)

Return type:

Tuple[ndarray, ndarray]

Returns:

curate_gpt.utils.vector_algorithms.top_n_matches(cosine_similarity_matrix, n=10)
Return type:

Tuple[ndarray, ndarray]

curate_gpt.utils.vectordb_operations module

curate_gpt.utils.vectordb_operations.match_collections(db, left_collection, right_collection, other_db=None)

Match every element in left collection with every element in right collection.

Currently this returns best matches for left collection only

Parameters:
  • db (DBAdapter)

  • left_collection (str)

  • right_collection (str)

  • other_db (Optional[DBAdapter]) – optional - defaults to main

Return type:

Iterator[Tuple[dict, dict, float]]

Returns:

tuple of object pair plus cosine similarity score

Module contents