curategpt.evaluation package

Submodules

curategpt.evaluation.base_evaluator module

class curategpt.evaluation.base_evaluator.BaseEvaluator(agent=None)

Bases: ABC

Base class for evaluators.

agent: BaseAgent = None
evaluate(test_collection, num_tests=10000, report_file=None, **kwargs)

Evaluate the agent on a test collection.

Parameters:
  • test_collection (str)

  • num_tests

  • report_file (Optional[TextIO])

  • kwargs

Return type:

ClassificationMetrics

Returns:

abstract evaluate_object(obj, **kwargs)

Evaluate the agent on a single object.

Parameters:
  • obj

  • kwargs

Return type:

ClassificationMetrics

Returns:

curategpt.evaluation.calc_statistics module

curategpt.evaluation.calc_statistics.aggregate_metrics(metrics_list, method=AggregationMethod.MACRO)

Aggregate a list of metrics.

Note that if the evaluation task is for a single labels rather than lists, then this is trivially just the proportion of correct predictions.

Parameters:
Returns:

curategpt.evaluation.calc_statistics.calculate_metrics(outcomes)
Return type:

ClassificationMetrics

curategpt.evaluation.calc_statistics.evaluate_predictions(obj1, obj2)

Evaluate a prediction compared to an expected value.

Where the prediction and the expected value are lists, the results are each true positive, true negative, false positive.

Where the prediction and the expected value are scalars, these are treated as if they are lists, thus a correct prediction is a true positive, and no false positives or negatives; an incorrect prediction is a false positive and a false negative.

Parameters:
  • obj1 (Any)

  • obj2 (Any)

Return type:

Iterator[Tuple[ClassificationOutcome, str]]

Returns:

curategpt.evaluation.dae_evaluator module

class curategpt.evaluation.dae_evaluator.DatabaseAugmentedCompletionEvaluator(agent=None, fields_to_predict=<factory>, fields_to_mask=<factory>)

Bases: BaseEvaluator

Retrieves objects in response to a query using a structured knowledge source.

agent: DragonAgent = None
evaluate(test_collection, num_tests=None, report_file=None, report_tsv_file=None, working_directory=None, **kwargs)

Evaluate the agent on a test collection.

Note: the main collection used for few-shot learning is passed in kwargs (this may change)

Parameters:
  • test_collection (str)

  • num_tests (Optional[int])

  • report_file (Optional[TextIO])

  • kwargs

Return type:

ClassificationMetrics

Returns:

evaluate_object(obj, **kwargs)

Evaluate the agent on a single object.

Parameters:
  • obj

  • kwargs

Return type:

ClassificationMetrics

Returns:

fields_to_mask: List[str]
fields_to_predict: List[str]

curategpt.evaluation.evaluation_datamodel module

class curategpt.evaluation.evaluation_datamodel.AggregationMethod(value)

Bases: str, Enum

An enumeration.

MACRO = 'macro'
MICRO = 'micro'
WEIGHTED = 'weighted'
class curategpt.evaluation.evaluation_datamodel.ClassificationMetrics(**data)

Bases: BaseModel

accuracy: float
f1_score: float
false_negatives: int
false_positives: int
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'accuracy': FieldInfo(annotation=float, required=True), 'f1_score': FieldInfo(annotation=float, required=True), 'false_negatives': FieldInfo(annotation=int, required=False, default=(None,)), 'false_positives': FieldInfo(annotation=int, required=False, default=(None,)), 'precision': FieldInfo(annotation=float, required=True), 'recall': FieldInfo(annotation=float, required=True), 'specificity': FieldInfo(annotation=float, required=True), 'true_negatives': FieldInfo(annotation=int, required=False, default=(None,)), 'true_positives': FieldInfo(annotation=int, required=False, default=(None,))}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

precision: float
recall: float
specificity: float
true_negatives: int
true_positives: int
class curategpt.evaluation.evaluation_datamodel.ClassificationOutcome(value)

Bases: str, Enum

An enumeration.

FALSE_NEGATIVE = 'False Negative'
FALSE_POSITIVE = 'False Positive'
TRUE_NEGATIVE = 'True Negative'
TRUE_POSITIVE = 'True Positive'
class curategpt.evaluation.evaluation_datamodel.StratifiedCollection(**data)

Bases: BaseModel

A collection of objects that have been split into training, test, and validation sets.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'source': FieldInfo(annotation=str, required=False, default=None), 'testing_set': FieldInfo(annotation=List[Dict], required=False, default=None), 'testing_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'training_set': FieldInfo(annotation=List[Dict], required=False, default=None), 'training_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'validation_set': FieldInfo(annotation=Union[List[Dict], NoneType], required=False, default=None), 'validation_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

source: str
testing_set: List[Dict]
testing_set_collection: Optional[str]
training_set: List[Dict]
training_set_collection: Optional[str]
validation_set: Optional[List[Dict]]
validation_set_collection: Optional[str]
class curategpt.evaluation.evaluation_datamodel.Task(**data)

Bases: BaseModel

A task to be run by the evaluation runner.

additional_collections: Optional[List[str]]
agent: Optional[str]
embedding_model_name: str
executed_on: Optional[str]
extractor: Optional[str]
fields_to_mask: Optional[List[str]]
fields_to_predict: Optional[List[str]]
generate_background: bool
property id: str
method: Optional[str]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'additional_collections': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'agent': FieldInfo(annotation=Union[str, NoneType], required=False, default='dae'), 'embedding_model_name': FieldInfo(annotation=str, required=False, default='openai:'), 'executed_on': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'extractor': FieldInfo(annotation=Union[str, NoneType], required=False, default='BasicExtractor'), 'fields_to_mask': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'fields_to_predict': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'generate_background': FieldInfo(annotation=bool, required=False, default=False), 'method': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'model_name': FieldInfo(annotation=str, required=False, default='gpt-3.5-turbo'), 'num_testing': FieldInfo(annotation=int, required=False, default=None), 'num_training': FieldInfo(annotation=int, required=False, default=None), 'num_validation': FieldInfo(annotation=int, required=False, default=0), 'report_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'results': FieldInfo(annotation=Union[ClassificationMetrics, NoneType], required=False, default=None), 'source_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'source_db_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'stratified_collection': FieldInfo(annotation=Union[StratifiedCollection, NoneType], required=False, default=None), 'target_db_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'task_finished': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'task_started': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'working_directory': FieldInfo(annotation=Union[str, Path, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

model_name: str
num_testing: int
num_training: int
num_validation: int
report_path: Optional[str]
results: Optional[ClassificationMetrics]
source_collection: Optional[str]
source_db_path: Optional[str]
stratified_collection: Optional[StratifiedCollection]
target_db_path: Optional[str]
task_finished: Optional[str]
task_started: Optional[str]
working_directory: Union[str, Path, None]

curategpt.evaluation.runner module

curategpt.evaluation.runner.run_task(task, report_path=None, report_file=None, report_tsv_file=None, fresh=False, **kwargs)

Evaluate the agent on a test collection.

Parameters:
  • task (Task)

  • report_path

  • report_file (Optional[TextIO])

  • fresh – if True, overwrite existing results file

  • kwargs – passed to the evaluator

Return type:

Task

Returns:

the task with results

curategpt.evaluation.splitter module

curategpt.evaluation.splitter.stratify_collection(store, collection, num_training=None, num_testing=None, num_validation=0, testing_identifiers=None, fields_to_predict=None, ratio=0.7, where=None)

Stratifies a collection into training, testing, and validation sets.

Parameters:
  • store (DBAdapter)

  • collection (str)

  • num_training (Optional[int])

  • num_testing (Optional[int])

  • num_validation

  • fields_to_predict (Union[str, List[str], None])

  • ratio

  • where (Optional[Dict[str, Any]])

Return type:

StratifiedCollection

Returns:

curategpt.evaluation.splitter.stratify_collection_to_store(store, collection, output_path, embedding_model=None, force=False, **kwargs)

Stratifies a collection into training, testing, and validation sets.

Each collection is persisted to a separate collection in the output_path.

Parameters:
  • store (DBAdapter)

  • collection (str)

  • output_path (str)

  • embedding_model

  • force

  • kwargs

Return type:

Dict[str, str]

Returns:

Module contents