curate_gpt.evaluation package

Submodules

curate_gpt.evaluation.base_evaluator module

class curate_gpt.evaluation.base_evaluator.BaseEvaluator(agent=None)

Bases: ABC

Base class for evaluators.

agent: BaseAgent = None
evaluate(test_collection, num_tests=10000, report_file=None, **kwargs)

Evaluate the agent on a test collection.

Parameters:
  • test_collection (str)

  • num_tests

  • report_file (Optional[TextIO])

  • kwargs

Return type:

ClassificationMetrics

Returns:

abstract evaluate_object(obj, **kwargs)

Evaluate the agent on a single object.

Parameters:
  • obj

  • kwargs

Return type:

ClassificationMetrics

Returns:

curate_gpt.evaluation.calc_statistics module

curate_gpt.evaluation.calc_statistics.aggregate_metrics(metrics_list, method=AggregationMethod.MACRO)

Aggregate a list of metrics.

Note that if the evaluation task is for a single labels rather than lists, then this is trivially just the proportion of correct predictions.

Parameters:
Returns:

curate_gpt.evaluation.calc_statistics.calculate_metrics(outcomes)
Return type:

ClassificationMetrics

curate_gpt.evaluation.calc_statistics.evaluate_predictions(obj1, obj2)

Evaluate a prediction compared to an expected value.

Where the prediction and the expected value are lists, the results are each true positive, true negative, false positive.

Where the prediction and the expected value are scalars, these are treated as if they are lists, thus a correct prediction is a true positive, and no false positives or negatives; an incorrect prediction is a false positive and a false negative.

Parameters:
  • obj1 (Any)

  • obj2 (Any)

Return type:

Iterator[Tuple[ClassificationOutcome, str]]

Returns:

curate_gpt.evaluation.dae_evaluator module

class curate_gpt.evaluation.dae_evaluator.DatabaseAugmentedCompletionEvaluator(agent=None, fields_to_predict=<factory>, fields_to_mask=<factory>)

Bases: BaseEvaluator

Retrieves objects in response to a query using a structured knowledge source.

agent: DragonAgent = None
evaluate(test_collection, num_tests=None, report_file=None, report_tsv_file=None, working_directory=None, **kwargs)

Evaluate the agent on a test collection.

Note: the main collection used for few-shot learning is passed in kwargs (this may change)

Parameters:
  • test_collection (str)

  • num_tests (Optional[int])

  • report_file (Optional[TextIO])

  • kwargs

Return type:

ClassificationMetrics

Returns:

evaluate_object(obj, **kwargs)

Evaluate the agent on a single object.

Parameters:
  • obj

  • kwargs

Return type:

ClassificationMetrics

Returns:

fields_to_mask: List[str]
fields_to_predict: List[str]

curate_gpt.evaluation.evaluation_datamodel module

class curate_gpt.evaluation.evaluation_datamodel.AggregationMethod(value)

Bases: str, Enum

An enumeration.

MACRO = 'macro'
MICRO = 'micro'
WEIGHTED = 'weighted'
class curate_gpt.evaluation.evaluation_datamodel.ClassificationMetrics(**data)

Bases: BaseModel

accuracy: float
f1_score: float
false_negatives: int
false_positives: int
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'accuracy': FieldInfo(annotation=float, required=True), 'f1_score': FieldInfo(annotation=float, required=True), 'false_negatives': FieldInfo(annotation=int, required=False, default=(None,)), 'false_positives': FieldInfo(annotation=int, required=False, default=(None,)), 'precision': FieldInfo(annotation=float, required=True), 'recall': FieldInfo(annotation=float, required=True), 'specificity': FieldInfo(annotation=float, required=True), 'true_negatives': FieldInfo(annotation=int, required=False, default=(None,)), 'true_positives': FieldInfo(annotation=int, required=False, default=(None,))}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

precision: float
recall: float
specificity: float
true_negatives: int
true_positives: int
class curate_gpt.evaluation.evaluation_datamodel.ClassificationOutcome(value)

Bases: str, Enum

An enumeration.

FALSE_NEGATIVE = 'False Negative'
FALSE_POSITIVE = 'False Positive'
TRUE_NEGATIVE = 'True Negative'
TRUE_POSITIVE = 'True Positive'
class curate_gpt.evaluation.evaluation_datamodel.StratifiedCollection(**data)

Bases: BaseModel

A collection of objects that have been split into training, test, and validation sets.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'source': FieldInfo(annotation=str, required=False, default=None), 'testing_set': FieldInfo(annotation=List[Dict], required=False, default=None), 'testing_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'training_set': FieldInfo(annotation=List[Dict], required=False, default=None), 'training_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'validation_set': FieldInfo(annotation=Union[List[Dict], NoneType], required=False, default=None), 'validation_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

source: str
testing_set: List[Dict]
testing_set_collection: Optional[str]
training_set: List[Dict]
training_set_collection: Optional[str]
validation_set: Optional[List[Dict]]
validation_set_collection: Optional[str]
class curate_gpt.evaluation.evaluation_datamodel.Task(**data)

Bases: BaseModel

A task to be run by the evaluation runner.

additional_collections: Optional[List[str]]
agent: Optional[str]
embedding_model_name: str
executed_on: Optional[str]
extractor: Optional[str]
fields_to_mask: Optional[List[str]]
fields_to_predict: Optional[List[str]]
generate_background: bool
property id: str
method: Optional[str]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'additional_collections': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'agent': FieldInfo(annotation=Union[str, NoneType], required=False, default='dae'), 'embedding_model_name': FieldInfo(annotation=str, required=False, default='openai:'), 'executed_on': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'extractor': FieldInfo(annotation=Union[str, NoneType], required=False, default='BasicExtractor'), 'fields_to_mask': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'fields_to_predict': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'generate_background': FieldInfo(annotation=bool, required=False, default=False), 'method': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'model_name': FieldInfo(annotation=str, required=False, default='gpt-3.5-turbo'), 'num_testing': FieldInfo(annotation=int, required=False, default=None), 'num_training': FieldInfo(annotation=int, required=False, default=None), 'num_validation': FieldInfo(annotation=int, required=False, default=0), 'report_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'results': FieldInfo(annotation=Union[ClassificationMetrics, NoneType], required=False, default=None), 'source_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'source_db_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'stratified_collection': FieldInfo(annotation=Union[StratifiedCollection, NoneType], required=False, default=None), 'target_db_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'task_finished': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'task_started': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'working_directory': FieldInfo(annotation=Union[Path, str, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

model_name: str
num_testing: int
num_training: int
num_validation: int
report_path: Optional[str]
results: Optional[ClassificationMetrics]
source_collection: Optional[str]
source_db_path: Optional[str]
stratified_collection: Optional[StratifiedCollection]
target_db_path: Optional[str]
task_finished: Optional[str]
task_started: Optional[str]
working_directory: Union[Path, str, None]

curate_gpt.evaluation.runner module

curate_gpt.evaluation.runner.run_task(task, report_path=None, report_file=None, report_tsv_file=None, fresh=False, **kwargs)

Evaluate the agent on a test collection.

Parameters:
  • task (Task)

  • report_path

  • report_file (Optional[TextIO])

  • fresh – if True, overwrite existing results file

  • kwargs – passed to the evaluator

Return type:

Task

Returns:

the task with results

curate_gpt.evaluation.splitter module

curate_gpt.evaluation.splitter.stratify_collection(store, collection, num_training=None, num_testing=None, num_validation=0, testing_identifiers=None, fields_to_predict=None, ratio=0.7, where=None)

Stratifies a collection into training, testing, and validation sets.

Parameters:
  • store (DBAdapter)

  • collection (str)

  • num_training (Optional[int])

  • num_testing (Optional[int])

  • num_validation

  • fields_to_predict (Union[str, List[str], None])

  • ratio

  • where (Optional[Dict[str, Any]])

Return type:

StratifiedCollection

Returns:

curate_gpt.evaluation.splitter.stratify_collection_to_store(store, collection, output_path, embedding_model=None, force=False, **kwargs)

Stratifies a collection into training, testing, and validation sets.

Each collection is persisted to a separate collection in the output_path.

Parameters:
  • store (DBAdapter)

  • collection (str)

  • output_path (str)

  • embedding_model

  • force

  • kwargs

Return type:

Dict[str, str]

Returns:

Module contents