curategpt.evaluation package

Submodules

curategpt.evaluation.base_evaluator module

class curategpt.evaluation.base_evaluator.BaseEvaluator(agent=None)

Bases: ABC

Base class for evaluators.

agent: BaseAgent = None

evaluate(test_collection, num_tests=10000, report_file=None, **kwargs)

Evaluate the agent on a test collection.

Parameters:

test_collection (str)
num_tests
report_file (TextIO)
kwargs

Return type:

ClassificationMetrics

Returns:

abstractmethod evaluate_object(obj, **kwargs)

Evaluate the agent on a single object.

Parameters:

obj
kwargs

Return type:

ClassificationMetrics

Returns:

curategpt.evaluation.calc_statistics module

curategpt.evaluation.calc_statistics.aggregate_metrics(metrics_list, method=AggregationMethod.MACRO)

Aggregate a list of metrics.

Note that if the evaluation task is for a single labels rather than lists, then this is trivially just the proportion of correct predictions.

Parameters:

metrics_list (List[ClassificationMetrics])
method (AggregationMethod)

Returns:

curategpt.evaluation.calc_statistics.calculate_metrics(outcomes)

Return type:: ClassificationMetrics

curategpt.evaluation.calc_statistics.evaluate_predictions(obj1, obj2)

Evaluate a prediction compared to an expected value.

Where the prediction and the expected value are lists, the results are each true positive, true negative, false positive.

Where the prediction and the expected value are scalars, these are treated as if they are lists, thus a correct prediction is a true positive, and no false positives or negatives; an incorrect prediction is a false positive and a false negative.

Parameters:

obj1 (Any)
obj2 (Any)

Return type:

Iterator[Tuple[ClassificationOutcome, str]]

Returns:

curategpt.evaluation.dae_evaluator module

class curategpt.evaluation.dae_evaluator.DatabaseAugmentedCompletionEvaluator(agent=None, fields_to_predict=<factory>, fields_to_mask=<factory>)

Bases: BaseEvaluator

Retrieves objects in response to a query using a structured knowledge source.

agent: DragonAgent = None

evaluate(test_collection, num_tests=None, report_file=None, report_tsv_file=None, working_directory=None, **kwargs)

Evaluate the agent on a test collection.

Note: the main collection used for few-shot learning is passed in kwargs (this may change)

Parameters:

test_collection (str)
num_tests (int)
report_file (TextIO)
kwargs

Return type:

ClassificationMetrics

Returns:

evaluate_object(obj, **kwargs)

Evaluate the agent on a single object.

Parameters:

obj
kwargs

Return type:

ClassificationMetrics

Returns:

fields_to_mask: List[str] = <dataclasses._MISSING_TYPE object>

fields_to_predict: List[str] = <dataclasses._MISSING_TYPE object>

curategpt.evaluation.evaluation_datamodel module

class curategpt.evaluation.evaluation_datamodel.AggregationMethod(*values)

Bases: str, Enum

MACRO = 'macro'

MICRO = 'micro'

WEIGHTED = 'weighted'

class curategpt.evaluation.evaluation_datamodel.ClassificationMetrics(**data)

Bases: BaseModel

accuracy: float

f1_score: float

false_negatives: int

false_positives: int

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

precision: float

recall: float

specificity: float

true_negatives: int

true_positives: int

class curategpt.evaluation.evaluation_datamodel.ClassificationOutcome(*values)

Bases: str, Enum

FALSE_NEGATIVE = 'False Negative'

FALSE_POSITIVE = 'False Positive'

TRUE_NEGATIVE = 'True Negative'

TRUE_POSITIVE = 'True Positive'

class curategpt.evaluation.evaluation_datamodel.StratifiedCollection(**data)

Bases: BaseModel

A collection of objects that have been split into training, test, and validation sets.

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

source: str

testing_set: List[Dict]

testing_set_collection: Optional[str]

training_set: List[Dict]

training_set_collection: Optional[str]

validation_set: Optional[List[Dict]]

validation_set_collection: Optional[str]

class curategpt.evaluation.evaluation_datamodel.Task(**data)

Bases: BaseModel

A task to be run by the evaluation runner.

additional_collections: Optional[List[str]]

agent: Optional[str]

embedding_model_name: str

executed_on: Optional[str]

extractor: Optional[str]

fields_to_mask: Optional[List[str]]

fields_to_predict: Optional[List[str]]

generate_background: bool

property id: str

method: Optional[str]

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str

num_testing: int

num_training: int

num_validation: int

report_path: Optional[str]

results: Optional[ClassificationMetrics]

source_collection: Optional[str]

source_db_path: Optional[str]

stratified_collection: Optional[StratifiedCollection]

target_db_path: Optional[str]

task_finished: Optional[str]

task_started: Optional[str]

working_directory: Union[Path, str, None]

curategpt.evaluation.runner module

curategpt.evaluation.runner.run_task(task, report_path=None, report_file=None, report_tsv_file=None, fresh=False, **kwargs)

Evaluate the agent on a test collection.

Parameters:

task (Task)
report_path
report_file (TextIO)
fresh – if True, overwrite existing results file
kwargs – passed to the evaluator

Return type:

Task

Returns:

the task with results

curategpt.evaluation.splitter module

curategpt.evaluation.splitter.stratify_collection(store, collection, num_training=None, num_testing=None, num_validation=0, testing_identifiers=None, fields_to_predict=None, ratio=0.7, where=None)

Stratifies a collection into training, testing, and validation sets.

Parameters:

store (DBAdapter)
collection (str)
num_training (Optional[int])
num_testing (Optional[int])
num_validation
fields_to_predict (Union[str, List[str], None])
ratio
where (Optional[Dict[str, Any]])

Return type:

StratifiedCollection

Returns:

curategpt.evaluation.splitter.stratify_collection_to_store(store, collection, output_path, embedding_model=None, force=False, **kwargs)

Stratifies a collection into training, testing, and validation sets.

Each collection is persisted to a separate collection in the output_path.

Parameters:

store (DBAdapter)
collection (str)
output_path (str)
embedding_model
force
kwargs

Return type:

Dict[str, str]

Returns:

curategpt.evaluation package

Submodules

curategpt.evaluation.base_evaluator module

curategpt.evaluation.calc_statistics module

curategpt.evaluation.dae_evaluator module

curategpt.evaluation.evaluation_datamodel module

curategpt.evaluation.runner module

curategpt.evaluation.splitter module

Module contents