curategpt.evaluation package
Submodules
curategpt.evaluation.base_evaluator module
- class curategpt.evaluation.base_evaluator.BaseEvaluator(agent=None)
Bases:
ABC
Base class for evaluators.
- evaluate(test_collection, num_tests=10000, report_file=None, **kwargs)
Evaluate the agent on a test collection.
- Parameters:
test_collection (
str
)num_tests
report_file (
Optional
[TextIO
])kwargs
- Return type:
- Returns:
- abstract evaluate_object(obj, **kwargs)
Evaluate the agent on a single object.
- Parameters:
obj
kwargs
- Return type:
- Returns:
curategpt.evaluation.calc_statistics module
- curategpt.evaluation.calc_statistics.aggregate_metrics(metrics_list, method=AggregationMethod.MACRO)
Aggregate a list of metrics.
Note that if the evaluation task is for a single labels rather than lists, then this is trivially just the proportion of correct predictions.
- Parameters:
metrics_list (
List
[ClassificationMetrics
])method (
AggregationMethod
)
- Returns:
- curategpt.evaluation.calc_statistics.calculate_metrics(outcomes)
- Return type:
- curategpt.evaluation.calc_statistics.evaluate_predictions(obj1, obj2)
Evaluate a prediction compared to an expected value.
Where the prediction and the expected value are lists, the results are each true positive, true negative, false positive.
Where the prediction and the expected value are scalars, these are treated as if they are lists, thus a correct prediction is a true positive, and no false positives or negatives; an incorrect prediction is a false positive and a false negative.
- Parameters:
obj1 (
Any
)obj2 (
Any
)
- Return type:
Iterator
[Tuple
[ClassificationOutcome
,str
]]- Returns:
curategpt.evaluation.dae_evaluator module
- class curategpt.evaluation.dae_evaluator.DatabaseAugmentedCompletionEvaluator(agent=None, fields_to_predict=<factory>, fields_to_mask=<factory>)
Bases:
BaseEvaluator
Retrieves objects in response to a query using a structured knowledge source.
-
agent:
DragonAgent
= None
- evaluate(test_collection, num_tests=None, report_file=None, report_tsv_file=None, working_directory=None, **kwargs)
Evaluate the agent on a test collection.
Note: the main collection used for few-shot learning is passed in kwargs (this may change)
- Parameters:
test_collection (
str
)num_tests (
Optional
[int
])report_file (
Optional
[TextIO
])kwargs
- Return type:
- Returns:
- evaluate_object(obj, **kwargs)
Evaluate the agent on a single object.
- Parameters:
obj
kwargs
- Return type:
- Returns:
-
fields_to_mask:
List
[str
]
-
fields_to_predict:
List
[str
]
-
agent:
curategpt.evaluation.evaluation_datamodel module
- class curategpt.evaluation.evaluation_datamodel.AggregationMethod(value)
Bases:
str
,Enum
An enumeration.
- MACRO = 'macro'
- MICRO = 'micro'
- WEIGHTED = 'weighted'
- class curategpt.evaluation.evaluation_datamodel.ClassificationMetrics(**data)
Bases:
BaseModel
-
accuracy:
float
-
f1_score:
float
-
false_negatives:
int
-
false_positives:
int
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'accuracy': FieldInfo(annotation=float, required=True), 'f1_score': FieldInfo(annotation=float, required=True), 'false_negatives': FieldInfo(annotation=int, required=False, default=(None,)), 'false_positives': FieldInfo(annotation=int, required=False, default=(None,)), 'precision': FieldInfo(annotation=float, required=True), 'recall': FieldInfo(annotation=float, required=True), 'specificity': FieldInfo(annotation=float, required=True), 'true_negatives': FieldInfo(annotation=int, required=False, default=(None,)), 'true_positives': FieldInfo(annotation=int, required=False, default=(None,))}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
-
precision:
float
-
recall:
float
-
specificity:
float
-
true_negatives:
int
-
true_positives:
int
-
accuracy:
- class curategpt.evaluation.evaluation_datamodel.ClassificationOutcome(value)
Bases:
str
,Enum
An enumeration.
- FALSE_NEGATIVE = 'False Negative'
- FALSE_POSITIVE = 'False Positive'
- TRUE_NEGATIVE = 'True Negative'
- TRUE_POSITIVE = 'True Positive'
- class curategpt.evaluation.evaluation_datamodel.StratifiedCollection(**data)
Bases:
BaseModel
A collection of objects that have been split into training, test, and validation sets.
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'source': FieldInfo(annotation=str, required=False, default=None), 'testing_set': FieldInfo(annotation=List[Dict], required=False, default=None), 'testing_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'training_set': FieldInfo(annotation=List[Dict], required=False, default=None), 'training_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'validation_set': FieldInfo(annotation=Union[List[Dict], NoneType], required=False, default=None), 'validation_set_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
-
source:
str
-
testing_set:
List
[Dict
]
-
testing_set_collection:
Optional
[str
]
-
training_set:
List
[Dict
]
-
training_set_collection:
Optional
[str
]
-
validation_set:
Optional
[List
[Dict
]]
-
validation_set_collection:
Optional
[str
]
- class curategpt.evaluation.evaluation_datamodel.Task(**data)
Bases:
BaseModel
A task to be run by the evaluation runner.
-
additional_collections:
Optional
[List
[str
]]
-
agent:
Optional
[str
]
-
embedding_model_name:
str
-
executed_on:
Optional
[str
]
-
extractor:
Optional
[str
]
-
fields_to_mask:
Optional
[List
[str
]]
-
fields_to_predict:
Optional
[List
[str
]]
-
generate_background:
bool
- property id: str
-
method:
Optional
[str
]
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'additional_collections': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'agent': FieldInfo(annotation=Union[str, NoneType], required=False, default='dae'), 'embedding_model_name': FieldInfo(annotation=str, required=False, default='openai:'), 'executed_on': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'extractor': FieldInfo(annotation=Union[str, NoneType], required=False, default='BasicExtractor'), 'fields_to_mask': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'fields_to_predict': FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None), 'generate_background': FieldInfo(annotation=bool, required=False, default=False), 'method': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'model_name': FieldInfo(annotation=str, required=False, default='gpt-3.5-turbo'), 'num_testing': FieldInfo(annotation=int, required=False, default=None), 'num_training': FieldInfo(annotation=int, required=False, default=None), 'num_validation': FieldInfo(annotation=int, required=False, default=0), 'report_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'results': FieldInfo(annotation=Union[ClassificationMetrics, NoneType], required=False, default=None), 'source_collection': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'source_db_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'stratified_collection': FieldInfo(annotation=Union[StratifiedCollection, NoneType], required=False, default=None), 'target_db_path': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'task_finished': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'task_started': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'working_directory': FieldInfo(annotation=Union[str, Path, NoneType], required=False, default=None)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
-
model_name:
str
-
num_testing:
int
-
num_training:
int
-
num_validation:
int
-
report_path:
Optional
[str
]
-
results:
Optional
[ClassificationMetrics
]
-
source_collection:
Optional
[str
]
-
source_db_path:
Optional
[str
]
-
stratified_collection:
Optional
[StratifiedCollection
]
-
target_db_path:
Optional
[str
]
-
task_finished:
Optional
[str
]
-
task_started:
Optional
[str
]
-
working_directory:
Union
[str
,Path
,None
]
-
additional_collections:
curategpt.evaluation.runner module
- curategpt.evaluation.runner.run_task(task, report_path=None, report_file=None, report_tsv_file=None, fresh=False, **kwargs)
Evaluate the agent on a test collection.
curategpt.evaluation.splitter module
- curategpt.evaluation.splitter.stratify_collection(store, collection, num_training=None, num_testing=None, num_validation=0, testing_identifiers=None, fields_to_predict=None, ratio=0.7, where=None)
Stratifies a collection into training, testing, and validation sets.
- Parameters:
store (
DBAdapter
)collection (
str
)num_training (
Optional
[int
])num_testing (
Optional
[int
])num_validation
fields_to_predict (
Union
[str
,List
[str
],None
])ratio
where (
Optional
[Dict
[str
,Any
]])
- Return type:
- Returns:
- curategpt.evaluation.splitter.stratify_collection_to_store(store, collection, output_path, embedding_model=None, force=False, **kwargs)
Stratifies a collection into training, testing, and validation sets.
Each collection is persisted to a separate collection in the output_path.
- Parameters:
store (
DBAdapter
)collection (
str
)output_path (
str
)embedding_model
force
kwargs
- Return type:
Dict
[str
,str
]- Returns: