curate_gpt.evaluation package

Submodules

curate_gpt.evaluation.base_evaluator module

class curate_gpt.evaluation.base_evaluator.BaseEvaluator(agent=None)

Bases: ABC

Base class for evaluators.

agent: BaseAgent = None

evaluate(test_collection, num_tests=10000, report_file=None, **kwargs)

Evaluate the agent on a test collection.

Parameters:

test_collection (str)
num_tests
report_file (Optional[TextIO])
kwargs

Return type:

ClassificationMetrics

Returns:

abstract evaluate_object(obj, **kwargs)

Evaluate the agent on a single object.

Parameters:

obj
kwargs

Return type:

ClassificationMetrics

Returns:

curate_gpt.evaluation.calc_statistics module

curate_gpt.evaluation.calc_statistics.aggregate_metrics(metrics_list, method=AggregationMethod.MACRO)

Aggregate a list of metrics.

Note that if the evaluation task is for a single labels rather than lists, then this is trivially just the proportion of correct predictions.

Parameters:

metrics_list (List[ClassificationMetrics])
method (AggregationMethod)

Returns:

curate_gpt.evaluation.calc_statistics.calculate_metrics(outcomes)

Return type:: ClassificationMetrics

curate_gpt.evaluation.calc_statistics.evaluate_predictions(obj1, obj2)

Evaluate a prediction compared to an expected value.

Where the prediction and the expected value are lists, the results are each true positive, true negative, false positive.

Where the prediction and the expected value are scalars, these are treated as if they are lists, thus a correct prediction is a true positive, and no false positives or negatives; an incorrect prediction is a false positive and a false negative.

Parameters:

obj1 (Any)
obj2 (Any)

Return type:

Iterator[Tuple[ClassificationOutcome, str]]

Returns:

curate_gpt.evaluation.dae_evaluator module

class curate_gpt.evaluation.dae_evaluator.DatabaseAugmentedCompletionEvaluator(agent=None, fields_to_predict=<factory>, fields_to_mask=<factory>)

Bases: BaseEvaluator

Retrieves objects in response to a query using a structured knowledge source.

agent: DragonAgent = None

evaluate(test_collection, num_tests=None, report_file=None, report_tsv_file=None, working_directory=None, **kwargs)

Evaluate the agent on a test collection.

Note: the main collection used for few-shot learning is passed in kwargs (this may change)

Parameters:

test_collection (str)
num_tests (Optional[int])
report_file (Optional[TextIO])
kwargs

Return type:

ClassificationMetrics

Returns:

evaluate_object(obj, **kwargs)

Evaluate the agent on a single object.

Parameters:

obj
kwargs

Return type:

ClassificationMetrics

Returns:

fields_to_mask: List[str]

fields_to_predict: List[str]

curate_gpt.evaluation.evaluation_datamodel module

class curate_gpt.evaluation.evaluation_datamodel.AggregationMethod(value)

Bases: str, Enum

An enumeration.

MACRO = 'macro'

MICRO = 'micro'

WEIGHTED = 'weighted'

class curate_gpt.evaluation.evaluation_datamodel.ClassificationMetrics(**data)

Bases: BaseModel

accuracy: float

f1_score: float

false_negatives: int

false_positives: int

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'accuracy': FieldInfo(annotation=float, required=True), 'f1_score': FieldInfo(annotation=float, required=True), 'false_negatives': FieldInfo(annotation=int, required=False, default=(None,)), 'false_positives': FieldInfo(annotation=int, required=False, default=(None,)), 'precision': FieldInfo(annotation=float, required=True), 'recall': FieldInfo(annotation=float, required=True), 'specificity': FieldInfo(annotation=float, required=True), 'true_negatives': FieldInfo(annotation=int, required=False, default=(None,)), 'true_positives': FieldInfo(annotation=int, required=False, default=(None,))}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

precision: float

recall: float

specificity: float

true_negatives: int

true_positives: int

class curate_gpt.evaluation.evaluation_datamodel.ClassificationOutcome(value)

Bases: str, Enum

An enumeration.

FALSE_NEGATIVE = 'False Negative'

FALSE_POSITIVE = 'False Positive'

TRUE_NEGATIVE = 'True Negative'

TRUE_POSITIVE = 'True Positive'

class curate_gpt.evaluation.evaluation_datamodel.StratifiedCollection(**data)

Bases: BaseModel

A collection of objects that have been split into training, test, and validation sets.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'source': FieldInfo(annotation=str, required=False, default=None), 'testing_set': FieldInfo(annotation=List[Dict], required=False, default=None), 'testing_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'training_set': FieldInfo(annotation=List[Dict], required=False, default=None), 'training_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'validation_set': FieldInfo(annotation=Union[List[Dict], NoneType], required=False, default=None), 'validation_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

source: str

testing_set: List[Dict]

testing_set_collection: Optional[str]

training_set: List[Dict]

training_set_collection: Optional[str]

validation_set: Optional[List[Dict]]

validation_set_collection: Optional[str]

class curate_gpt.evaluation.evaluation_datamodel.Task(**data)

Bases: BaseModel

A task to be run by the evaluation runner.

additional_collections: Optional[List[str]]

agent: Optional[str]

embedding_model_name: str

executed_on: Optional[str]

extractor: Optional[str]

fields_to_mask: Optional[List[str]]

fields_to_predict: Optional[List[str]]

generate_background: bool

property id: str

method: Optional[str]

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'additional_collections': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'agent': FieldInfo(annotation=Union[str, NoneType], required=False, default='dae'), 'embedding_model_name': FieldInfo(annotation=str, required=False, default='openai:'), 'executed_on': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'extractor': FieldInfo(annotation=Union[str, NoneType], required=False, default='BasicExtractor'), 'fields_to_mask': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'fields_to_predict': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'generate_background': FieldInfo(annotation=bool, required=False, default=False), 'method': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'model_name': FieldInfo(annotation=str, required=False, default='gpt-3.5-turbo'), 'num_testing': FieldInfo(annotation=int, required=False, default=None), 'num_training': FieldInfo(annotation=int, required=False, default=None), 'num_validation': FieldInfo(annotation=int, required=False, default=0), 'report_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'results': FieldInfo(annotation=Union[ClassificationMetrics, NoneType], required=False, default=None), 'source_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'source_db_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'stratified_collection': FieldInfo(annotation=Union[StratifiedCollection, NoneType], required=False, default=None), 'target_db_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'task_finished': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'task_started': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'working_directory': FieldInfo(annotation=Union[Path, str, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

model_name: str

num_testing: int

num_training: int

num_validation: int

report_path: Optional[str]

results: Optional[ClassificationMetrics]

source_collection: Optional[str]

source_db_path: Optional[str]

stratified_collection: Optional[StratifiedCollection]

target_db_path: Optional[str]

task_finished: Optional[str]

task_started: Optional[str]

working_directory: Union[Path, str, None]

curate_gpt.evaluation.runner module

curate_gpt.evaluation.runner.run_task(task, report_path=None, report_file=None, report_tsv_file=None, fresh=False, **kwargs)

Evaluate the agent on a test collection.

Parameters:

task (Task)
report_path
report_file (Optional[TextIO])
fresh – if True, overwrite existing results file
kwargs – passed to the evaluator

Return type:

Task

Returns:

the task with results

curate_gpt.evaluation.splitter module

curate_gpt.evaluation.splitter.stratify_collection(store, collection, num_training=None, num_testing=None, num_validation=0, testing_identifiers=None, fields_to_predict=None, ratio=0.7, where=None)

Stratifies a collection into training, testing, and validation sets.

Parameters:

store (DBAdapter)
collection (str)
num_training (Optional[int])
num_testing (Optional[int])
num_validation
fields_to_predict (Union[str, List[str], None])
ratio
where (Optional[Dict[str, Any]])

Return type:

StratifiedCollection

Returns:

curate_gpt.evaluation.splitter.stratify_collection_to_store(store, collection, output_path, embedding_model=None, force=False, **kwargs)

Stratifies a collection into training, testing, and validation sets.

Each collection is persisted to a separate collection in the output_path.

Parameters:

store (DBAdapter)
collection (str)
output_path (str)
embedding_model
force
kwargs

Return type:

Dict[str, str]

Returns:

curate_gpt.evaluation package

Submodules

curate_gpt.evaluation.base_evaluator module

curate_gpt.evaluation.calc_statistics module

curate_gpt.evaluation.dae_evaluator module

curate_gpt.evaluation.evaluation_datamodel module

curate_gpt.evaluation.runner module

curate_gpt.evaluation.splitter module

Module contents