curategpt package
Subpackages
- curategpt.adhoc package
- Submodules
- curategpt.adhoc.gocam_predictor module
GOCAMPredictorGOCAMPredictor.collection_nameGOCAMPredictor.database_pathGOCAMPredictor.database_typeGOCAMPredictor.extractorGOCAMPredictor.fix_yaml()GOCAMPredictor.gocam_by_id()GOCAMPredictor.gocam_wrapperGOCAMPredictor.include_standard_annotationsGOCAMPredictor.model_nameGOCAMPredictor.predict_activity_unit()GOCAMPredictor.storeGOCAMPredictor.strict
- Module contents
- curategpt.agents package
- Submodules
- curategpt.agents.agent_utils module
- curategpt.agents.base_agent module
- curategpt.agents.bootstrap_agent module
- curategpt.agents.chat_agent module
- curategpt.agents.concept_recognition_agent module
AnnotatedTextAnnotationMethodConceptRecognitionAgentConceptRecognitionAgent.annotate()ConceptRecognitionAgent.annotate_concept_list()ConceptRecognitionAgent.annotate_inline()ConceptRecognitionAgent.annotate_two_pass()ConceptRecognitionAgent.ground_concept()ConceptRecognitionAgent.identifier_fieldConceptRecognitionAgent.label_fieldConceptRecognitionAgent.prefixesConceptRecognitionAgent.relevance_factorConceptRecognitionAgent.split_input_text
GroundingResultSpanparse_annotations()parse_spans()
- curategpt.agents.dase_agent module
DatabaseAugmentedStructuredExtractionDatabaseAugmentedStructuredExtraction.background_document_limitDatabaseAugmentedStructuredExtraction.conversationDatabaseAugmentedStructuredExtraction.conversation_modeDatabaseAugmentedStructuredExtraction.default_masked_fieldsDatabaseAugmentedStructuredExtraction.default_target_classDatabaseAugmentedStructuredExtraction.document_adapterDatabaseAugmentedStructuredExtraction.document_adapter_collectionDatabaseAugmentedStructuredExtraction.extract()DatabaseAugmentedStructuredExtraction.max_background_document_sizeDatabaseAugmentedStructuredExtraction.relevance_factor
PredictedFieldValue
- curategpt.agents.dragon_agent module
DragonAgentDragonAgent.background_document_limitDragonAgent.complete()DragonAgent.conversationDragonAgent.conversation_modeDragonAgent.default_masked_fieldsDragonAgent.default_target_classDragonAgent.document_adapterDragonAgent.document_adapter_collectionDragonAgent.generate_all()DragonAgent.generate_queries()DragonAgent.max_background_document_sizeDragonAgent.relevance_factorDragonAgent.review()
PredictedFieldValue
- curategpt.agents.evidence_agent module
- curategpt.agents.huggingface_agent module
- curategpt.agents.mapping_agent module
- curategpt.agents.summarization_agent module
- Module contents
ChatAgentDragonAgentDragonAgent.background_document_limitDragonAgent.complete()DragonAgent.conversationDragonAgent.conversation_modeDragonAgent.default_masked_fieldsDragonAgent.default_target_classDragonAgent.document_adapterDragonAgent.document_adapter_collectionDragonAgent.generate_all()DragonAgent.generate_queries()DragonAgent.max_background_document_sizeDragonAgent.relevance_factorDragonAgent.review()
EvidenceAgentMappingAgent
- curategpt.app package
- curategpt.conf package
- curategpt.evaluation package
- Submodules
- curategpt.evaluation.base_evaluator module
- curategpt.evaluation.calc_statistics module
- curategpt.evaluation.dae_evaluator module
- curategpt.evaluation.evaluation_datamodel module
AggregationMethodClassificationMetricsClassificationMetrics.accuracyClassificationMetrics.f1_scoreClassificationMetrics.false_negativesClassificationMetrics.false_positivesClassificationMetrics.model_configClassificationMetrics.precisionClassificationMetrics.recallClassificationMetrics.specificityClassificationMetrics.true_negativesClassificationMetrics.true_positives
ClassificationOutcomeStratifiedCollectionStratifiedCollection.model_configStratifiedCollection.sourceStratifiedCollection.testing_setStratifiedCollection.testing_set_collectionStratifiedCollection.training_setStratifiedCollection.training_set_collectionStratifiedCollection.validation_setStratifiedCollection.validation_set_collection
TaskTask.additional_collectionsTask.agentTask.embedding_model_nameTask.executed_onTask.extractorTask.fields_to_maskTask.fields_to_predictTask.generate_backgroundTask.idTask.methodTask.model_configTask.model_nameTask.num_testingTask.num_trainingTask.num_validationTask.report_pathTask.resultsTask.source_collectionTask.source_db_pathTask.stratified_collectionTask.target_db_pathTask.task_finishedTask.task_startedTask.working_directory
- curategpt.evaluation.runner module
- curategpt.evaluation.splitter module
- Module contents
- curategpt.extract package
- curategpt.formatters package
- curategpt.store package
- Submodules
- curategpt.store.chromadb_adapter module
ChromaDBAdapterChromaDBAdapter.clientChromaDBAdapter.collection_metadata()ChromaDBAdapter.collections()ChromaDBAdapter.default_max_document_lengthChromaDBAdapter.default_modelChromaDBAdapter.diversified_search()ChromaDBAdapter.dump_then_load()ChromaDBAdapter.fetch_all_objects_memory_safe()ChromaDBAdapter.find()ChromaDBAdapter.id_fieldChromaDBAdapter.id_to_objectChromaDBAdapter.insert()ChromaDBAdapter.insert_from_huggingface()ChromaDBAdapter.list_collection_names()ChromaDBAdapter.lookup()ChromaDBAdapter.matches()ChromaDBAdapter.nameChromaDBAdapter.normalize_metadata()ChromaDBAdapter.nparray_to_list()ChromaDBAdapter.peek()ChromaDBAdapter.populate_venomx()ChromaDBAdapter.remove_collection()ChromaDBAdapter.reset()ChromaDBAdapter.search()ChromaDBAdapter.set_collection_metadata()ChromaDBAdapter.text_lookupChromaDBAdapter.update()ChromaDBAdapter.update_collection_metadata()ChromaDBAdapter.upsert()
- curategpt.store.db_adapter module
DBAdapterDBAdapter.collectionDBAdapter.collection_metadata()DBAdapter.create_view()DBAdapter.delete()DBAdapter.dump()DBAdapter.dump_then_load()DBAdapter.fetch_all_objects_memory_safe()DBAdapter.field_names()DBAdapter.find()DBAdapter.identifier_field()DBAdapter.insert()DBAdapter.insert_from_huggingface()DBAdapter.label_field()DBAdapter.list_collection_names()DBAdapter.lookup()DBAdapter.lookup_multiple()DBAdapter.matches()DBAdapter.nameDBAdapter.pathDBAdapter.peek()DBAdapter.remove_collection()DBAdapter.schema_proxyDBAdapter.search()DBAdapter.set_collection()DBAdapter.set_collection_metadata()DBAdapter.update()DBAdapter.update_collection_metadata()DBAdapter.upsert()
- curategpt.store.db_metadata module
- curategpt.store.duckdb_adapter module
DuckDBAdapterDuckDBAdapter.MDuckDBAdapter.collection_metadata()DuckDBAdapter.connDuckDBAdapter.create_index()DuckDBAdapter.default_max_document_lengthDuckDBAdapter.default_modelDuckDBAdapter.determine_fields_to_include()DuckDBAdapter.distance_metricDuckDBAdapter.dump_then_load()DuckDBAdapter.ef_constructionDuckDBAdapter.ef_searchDuckDBAdapter.fetch_all_objects_memory_safe()DuckDBAdapter.find()DuckDBAdapter.get_raw_objects()DuckDBAdapter.id_fieldDuckDBAdapter.id_to_objectDuckDBAdapter.identifier_field()DuckDBAdapter.insert()DuckDBAdapter.insert_from_huggingface()DuckDBAdapter.kill_process()DuckDBAdapter.list_collection_names()DuckDBAdapter.lookup()DuckDBAdapter.matches()DuckDBAdapter.nameDuckDBAdapter.openai_clientDuckDBAdapter.parse_duckdb_result()DuckDBAdapter.peek()DuckDBAdapter.populate_venomx()DuckDBAdapter.remove_collection()DuckDBAdapter.search()DuckDBAdapter.set_collection_metadata()DuckDBAdapter.text_lookupDuckDBAdapter.update()DuckDBAdapter.update_collection_metadata()DuckDBAdapter.update_or_create_venomx()DuckDBAdapter.upsert()DuckDBAdapter.vec_dimension
- curategpt.store.duckdb_connection_handler module
- curategpt.store.duckdb_result module
- curategpt.store.in_memory_adapter module
CollectionCollectionIndexInMemoryAdapterInMemoryAdapter.collection_indexInMemoryAdapter.collection_metadata()InMemoryAdapter.delete()InMemoryAdapter.fetch_all_objects_memory_safe()InMemoryAdapter.find()InMemoryAdapter.insert()InMemoryAdapter.list_collection_names()InMemoryAdapter.lookup()InMemoryAdapter.matches()InMemoryAdapter.nameInMemoryAdapter.peek()InMemoryAdapter.populate_venomx()InMemoryAdapter.remove_collection()InMemoryAdapter.search()InMemoryAdapter.set_collection_metadata()InMemoryAdapter.update()InMemoryAdapter.update_collection_metadata()InMemoryAdapter.upsert()
- curategpt.store.metadata module
- curategpt.store.schema_proxy module
- curategpt.store.vocab module
- Module contents
ChromaDBAdapterChromaDBAdapter.clientChromaDBAdapter.collection_metadata()ChromaDBAdapter.collections()ChromaDBAdapter.default_max_document_lengthChromaDBAdapter.default_modelChromaDBAdapter.diversified_search()ChromaDBAdapter.dump_then_load()ChromaDBAdapter.fetch_all_objects_memory_safe()ChromaDBAdapter.find()ChromaDBAdapter.id_fieldChromaDBAdapter.id_to_objectChromaDBAdapter.insert()ChromaDBAdapter.insert_from_huggingface()ChromaDBAdapter.list_collection_names()ChromaDBAdapter.lookup()ChromaDBAdapter.matches()ChromaDBAdapter.nameChromaDBAdapter.normalize_metadata()ChromaDBAdapter.nparray_to_list()ChromaDBAdapter.peek()ChromaDBAdapter.populate_venomx()ChromaDBAdapter.remove_collection()ChromaDBAdapter.reset()ChromaDBAdapter.search()ChromaDBAdapter.set_collection_metadata()ChromaDBAdapter.text_lookupChromaDBAdapter.update()ChromaDBAdapter.update_collection_metadata()ChromaDBAdapter.upsert()
DBAdapterDBAdapter.collectionDBAdapter.collection_metadata()DBAdapter.create_view()DBAdapter.delete()DBAdapter.dump()DBAdapter.dump_then_load()DBAdapter.fetch_all_objects_memory_safe()DBAdapter.field_names()DBAdapter.find()DBAdapter.identifier_field()DBAdapter.insert()DBAdapter.insert_from_huggingface()DBAdapter.label_field()DBAdapter.list_collection_names()DBAdapter.lookup()DBAdapter.lookup_multiple()DBAdapter.matches()DBAdapter.nameDBAdapter.pathDBAdapter.peek()DBAdapter.remove_collection()DBAdapter.schema_proxyDBAdapter.search()DBAdapter.set_collection()DBAdapter.set_collection_metadata()DBAdapter.update()DBAdapter.update_collection_metadata()DBAdapter.upsert()
DuckDBAdapterDuckDBAdapter.MDuckDBAdapter.collection_metadata()DuckDBAdapter.connDuckDBAdapter.create_index()DuckDBAdapter.default_max_document_lengthDuckDBAdapter.default_modelDuckDBAdapter.determine_fields_to_include()DuckDBAdapter.distance_metricDuckDBAdapter.dump_then_load()DuckDBAdapter.ef_constructionDuckDBAdapter.ef_searchDuckDBAdapter.fetch_all_objects_memory_safe()DuckDBAdapter.find()DuckDBAdapter.get_raw_objects()DuckDBAdapter.id_fieldDuckDBAdapter.id_to_objectDuckDBAdapter.identifier_field()DuckDBAdapter.insert()DuckDBAdapter.insert_from_huggingface()DuckDBAdapter.kill_process()DuckDBAdapter.list_collection_names()DuckDBAdapter.lookup()DuckDBAdapter.matches()DuckDBAdapter.nameDuckDBAdapter.openai_clientDuckDBAdapter.parse_duckdb_result()DuckDBAdapter.peek()DuckDBAdapter.populate_venomx()DuckDBAdapter.remove_collection()DuckDBAdapter.search()DuckDBAdapter.set_collection_metadata()DuckDBAdapter.text_lookupDuckDBAdapter.update()DuckDBAdapter.update_collection_metadata()DuckDBAdapter.update_or_create_venomx()DuckDBAdapter.upsert()DuckDBAdapter.vec_dimension
MetadataSchemaProxyget_store()
- curategpt.utils package
- Submodules
- curategpt.utils.eval_utils module
- curategpt.utils.llm_utils module
- curategpt.utils.patch_utils module
- curategpt.utils.search module
- curategpt.utils.tokens module
- curategpt.utils.vector_algorithms module
- curategpt.utils.vectordb_operations module
- Module contents
- curategpt.views package
- curategpt.wrappers package
- Subpackages
- curategpt.wrappers.bio package
- Submodules
- curategpt.wrappers.bio.alliance_gene_wrapper module
- curategpt.wrappers.bio.bacdive_wrapper module
- curategpt.wrappers.bio.gocam_wrapper module
- curategpt.wrappers.bio.mediadive_wrapper module
- curategpt.wrappers.bio.omicsdi_wrapper module
- curategpt.wrappers.bio.reactome_wrapper module
- curategpt.wrappers.bio.uniprot_wrapper module
- Module contents
- curategpt.wrappers.clinical package
- curategpt.wrappers.general package
- Submodules
- curategpt.wrappers.general.filesystem_wrapper module
- curategpt.wrappers.general.github_wrapper module
- curategpt.wrappers.general.google_drive_wrapper module
- curategpt.wrappers.general.gspread_wrapper module
- curategpt.wrappers.general.json_wrapper module
- curategpt.wrappers.general.linkml_schema_wrapper module
- Module contents
- curategpt.wrappers.investigation package
- Submodules
- curategpt.wrappers.investigation.ess_deepdive_wrapper module
- curategpt.wrappers.investigation.fairsharing_wrapper module
- curategpt.wrappers.investigation.jgi_wrapper module
- curategpt.wrappers.investigation.ncbi_bioproject_wrapper module
- curategpt.wrappers.investigation.ncbi_biosample_wrapper module
- curategpt.wrappers.investigation.nmdc_wrapper module
- Module contents
- curategpt.wrappers.legal package
- curategpt.wrappers.literature package
- curategpt.wrappers.ontology package
- curategpt.wrappers.paperqa package
- curategpt.wrappers.bio package
- Submodules
- curategpt.wrappers.base_wrapper module
BaseWrapperBaseWrapper.chat()BaseWrapper.create_curie()BaseWrapper.default_embedding_modelBaseWrapper.default_object_typeBaseWrapper.external_search()BaseWrapper.extract_concepts_from_text()BaseWrapper.extractorBaseWrapper.local_storeBaseWrapper.max_text_lengthBaseWrapper.nameBaseWrapper.objects()BaseWrapper.objects_by_ids()BaseWrapper.prefixBaseWrapper.search()BaseWrapper.search_limit_multiplierBaseWrapper.source_locatorBaseWrapper.split_objects()BaseWrapper.text_overlapBaseWrapper.unwrap_object()BaseWrapper.wrap_object()
- Module contents
BaseWrapperBaseWrapper.chat()BaseWrapper.create_curie()BaseWrapper.default_embedding_modelBaseWrapper.default_object_typeBaseWrapper.external_search()BaseWrapper.extract_concepts_from_text()BaseWrapper.extractorBaseWrapper.local_storeBaseWrapper.max_text_lengthBaseWrapper.nameBaseWrapper.objects()BaseWrapper.objects_by_ids()BaseWrapper.prefixBaseWrapper.search()BaseWrapper.search_limit_multiplierBaseWrapper.source_locatorBaseWrapper.split_objects()BaseWrapper.text_overlapBaseWrapper.unwrap_object()BaseWrapper.wrap_object()
get_wrapper()
- Subpackages
Submodules
curategpt.cli module
Command line interface for curategpt.
Module contents
CurateGPT: A framework semi-assisted curation of knowledge bases.
Architecture
store: json object stores that allow for embedding based searchwrappers: wraps external APIs and data sources for ingestextract: extraction of json objects from LLMsagents: agents that chain together search and generate componentsformatters: formats data objects for presentation to humans and machine agentsapp: streamlit application
- class curategpt.BasicExtractor(schema_proxy=None, model_name='gpt-4o', api_key=None, raise_error_if_unparsable=False, serialization_format='json')
Bases:
ExtractorExtractor that is purely example driven.
- deserialize(text, format=None, **kwargs)
Deserialize text into an annotated object
- Parameters:
text (
str)- Return type:
- Returns:
- deserialize_yaml(text, multiple=False)
- Return type:
- extract(text, target_class, examples=None, background_text=None, rules=None, min_examples=1, **kwargs)
Schema-guided extraction
- Parameters:
text (
str)kwargs
- Return type:
- Returns:
- model_config = {'protected_namespaces': ()}
-
model_name:
str= 'gpt-4o'
-
serialization_format:
str= 'json'
- serialize(ao)
- Return type:
str
- class curategpt.ChromaDBAdapter(path=None, schema_proxy=None, collection=None, _field_names_by_collection=None, default_model='all-MiniLM-L6-v2', client=None, id_field='id', text_lookup='text', id_to_object=<factory>)
Bases:
DBAdapterAn Adapter that wraps a ChromaDB client
-
client:
ClientAPI= None
- collection_metadata(collection_name=None, include_derived=False, **kwargs)
Get the metadata for a collection.
- Parameters:
collection_name (
Optional[str])- Return type:
Optional[Metadata]- Returns:
Parameters
- collections()
Return the names of all collections in the database.
- Return type:
Iterator[str]- Returns:
-
default_max_document_length:
ClassVar[int] = 6000
-
default_model:
str= 'all-MiniLM-L6-v2'
- diversified_search(text=None, limit=None, relevance_factor=0.5, collection=None, **kwargs)
- Return type:
Iterator[Tuple[DuckDBSearchResult,Dict,float,Optional[Dict]]]
- dump_then_load(collection=None, target=None)
Dump a collection to a file, then load it into another database.
- Parameters:
collection (
str)target (
DBAdapter)
- Returns:
- fetch_all_objects_memory_safe(collection=None, batch_size=100, **kwargs)
Fetch all objects from a collection, in batches to avoid memory overload.
- Return type:
Iterator[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]
- find(where=None, projection=None, collection=None, **kwargs)
Query the database.
>>> from curategpt.store import get_store >>> store = get_store("chromadb", "db") >>> objs = list(store.find({"name": "NeuronOfTheForebrain"}, collection="ont_cl"))
- Parameters:
collection (
str)where (
Union[str,YAMLRoot,BaseModel,Dict,DuckDBSearchResult])projection (
Union[str,List[str]])kwargs
- Return type:
Iterator[Tuple[DuckDBSearchResult,Dict,float,Optional[Dict]]]- Returns:
-
id_field:
str= 'id'
-
id_to_object:
Mapping[str,Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]] = <dataclasses._MISSING_TYPE object>
- insert(objs, **kwargs)
Insert an object or list of objects into the store.
>>> from curategpt.store import get_store >>> store = get_store("in_memory") >>> store.insert([{"name": "John", "age": 42}], collection="people")
- Parameters:
objs (
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult,Iterable[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]])collection
- Returns:
- insert_from_huggingface(objs, collection=None, batch_size=None, text_field=None, venomx=None, method_name='add', **kwargs)
- list_collection_names()
List all collections in the database.
- Return type:
List[str]- Returns:
- lookup(id, collection=None, **kwargs)
Lookup an object by its ID.
- Parameters:
id (
str)collection (
str)
- Return type:
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]- Returns:
- matches(obj, **kwargs)
Query the database for matches to an object.
- Parameters:
obj (
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult])kwargs
- Return type:
Iterator[Tuple[DuckDBSearchResult,Dict,float,Optional[Dict]]]- Returns:
-
name:
ClassVar[str] = 'chromadb'
- normalize_metadata(metadata)
Normalize metadata downloaded from huggingface. Transformation to parquet forces nested lists to be turned into array type so we flatten those again.
- Return type:
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]
- nparray_to_list(obj)
- peek(collection=None, limit=5, offset=0, **kwargs)
Peek at first N objects in a collection.
- Parameters:
collection (
str)limit
- Return type:
Iterator[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]- Returns:
- static populate_venomx(collection, model, existing_venomx)
- Return type:
Index
- remove_collection(collection=None, exists_ok=False, **kwargs)
Remove a collection from the database.
- Parameters:
collection (
str)exists_ok
- Returns:
- reset()
Reset/delete the database.
- search(text, **kwargs)
Query the database for a text string.
>>> from curategpt.store import get_store >>> store = get_store("chromadb", "db") >>> for obj, distance, info in store.search("forebrain neurons", collection="ont_cl"): ... obj_id = obj["id"] ... # print at precision of 2 decimal places ... print(f"{obj_id} {distance:.2f}") ... NeuronOfTheForebrain 0.28 ...
- Parameters:
text (
str)collection
where
kwargs
- Return type:
Iterator[Tuple[DuckDBSearchResult,Dict,float,Optional[Dict]]]- Returns:
tuple of object, distance, metadata
- set_collection_metadata(collection_name, metadata, **kwargs)
Set the metadata for a collection.
-
text_lookup:
Union[str,Callable,None] = 'text'
- update(objs, **kwargs)
Update an object or list of objects in the store.
- Parameters:
objs (
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult,List[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]])collection
- Returns:
- update_collection_metadata(collection_name, **kwargs)
Update the metadata for a collection based on the adapter.
- Parameters:
collection_name (
str) – Name of the collection.kwargs – Additional metadata fields.
- Return type:
- Returns:
Updated Metadata instance.
- upsert(objs, **kwargs)
Update an object or list of objects in the store.
- Parameters:
objs (
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult,List[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]])collection
- Returns:
-
client:
- class curategpt.DBAdapter(path=None, schema_proxy=None, collection=None, _field_names_by_collection=None)
Bases:
ABCBase class for stores.
This base class provides a common interface for a wide variety of document or object stores. The interface is intended to closely mimic the kind of interface found for document stores such as mongoDB or vector databases such as ChromaDB, but the intention is that can be used for SQL databases, SPARQL endpoints, or even file systems.
The store allows for storage and retrieval of objects which are arbitrary dictionary objects, equivalient to a JSON object.
Objects are partitioned into collections, which maps to the equivalent concept in MongoDB and ChromaDB.
>>> from curategpt.store import get_store >>> store = get_store("in_memory") >>> store.insert({"name": "John", "age": 42}, collection="people")
If you are used to working with MongoDB and ChromaDB APIs directly, one difference is that here we do not provide a separate Collection object, everything is handled through the store object. You can optionally bind a store object to a collection, which effectively gives you a collection object:
>>> from curategpt.store import get_store >>> store = get_store("in_memory") >>> store.set_collection("people") >>> store.insert({"name": "John", "age": 42})
TODO: decide if this is the final interface
-
collection:
Optional[str] = None Default collection
- abstractmethod collection_metadata(collection_name=None, include_derived=False, **kwargs)
Get the metadata for a collection.
- Parameters:
collection_name (
Optional[str])include_derived – Include derived metadata, e.g. counts
- Return type:
Optional[Metadata]- Returns:
- create_view(view_name, collection, expression, **kwargs)
Create a view in the database.
Todo:
- param view:
- return:
- delete(id, collection=None, **kwargs)
Delete an object by its ID.
- Parameters:
id (
str)collection (
str)
- Returns:
- dump(collection=None, to_file=None, metadata_to_file=None, format=None, include=None, **kwargs)
Dump the database to a file.
- Parameters:
collection (
str)kwargs
- Returns:
- dump_then_load(collection=None, target=None)
Dump a collection to a file, then load it into another database.
- Parameters:
collection (
str)target (
DBAdapter)
- Returns:
- abstractmethod fetch_all_objects_memory_safe(collection=None, batch_size=100, **kwargs)
Fetch all objects from a collection, in batches to avoid memory overload.
- Return type:
Iterator[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]
- field_names(collection=None)
Return the names of all top level fields in the database for a collection.
- Parameters:
collection (
str)- Return type:
List[str]- Returns:
- find(where=None, projection=None, collection=None, **kwargs)
Query the database.
>>> from curategpt.store import get_store >>> store = get_store("chromadb", "db") >>> objs = list(store.find({"name": "NeuronOfTheForebrain"}, collection="ont_cl"))
- Parameters:
collection (
str)where (
Union[str,YAMLRoot,BaseModel,Dict,DuckDBSearchResult])projection (
Union[str,List[str]])kwargs
- Return type:
Iterator[Tuple[DuckDBSearchResult,Dict,float,Optional[Dict]]]- Returns:
- identifier_field(collection=None)
- Return type:
str
- abstractmethod insert(objs, collection=None, **kwargs)
Insert an object or list of objects into the store.
>>> from curategpt.store import get_store >>> store = get_store("in_memory") >>> store.insert([{"name": "John", "age": 42}], collection="people")
- Parameters:
objs (
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult,Iterable[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]])collection (
str)
- Returns:
- insert_from_huggingface(objs, collection=None, **kwargs)
- label_field(collection=None)
- Return type:
str
- abstractmethod list_collection_names()
List all collections in the database.
- Return type:
List[str]- Returns:
names of collections
- abstractmethod lookup(id, collection=None, **kwargs)
Lookup an object by its ID.
- Parameters:
id (
str)collection (
str)
- Return type:
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]- Returns:
- lookup_multiple(ids, **kwargs)
Lookup an object by its ID.
- Parameters:
id
collection
- Return type:
Iterator[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]- Returns:
- abstractmethod matches(obj, **kwargs)
Query the database for matches to an object.
- Parameters:
obj (
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult])kwargs
- Return type:
Iterator[Tuple[DuckDBSearchResult,Dict,float,Optional[Dict]]]- Returns:
-
name:
ClassVar[str] = 'base'
-
path:
str= None Path to a location where the database is stored or disk or the network.
- abstractmethod peek(collection=None, limit=5, **kwargs)
Peek at first N objects in a collection.
- Parameters:
collection (
str)limit
- Return type:
Iterator[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]- Returns:
- remove_collection(collection=None, exists_ok=False, **kwargs)
Remove a collection from the database.
- Parameters:
collection (
str)- Returns:
-
schema_proxy:
Optional[SchemaProxy] = None Schema manager
- abstractmethod search(text, where=None, collection=None, **kwargs)
Query the database for a text string.
>>> from curategpt.store import get_store >>> store = get_store("chromadb", "db") >>> for obj, distance, info in store.search("forebrain neurons", collection="ont_cl"): ... obj_id = obj["id"] ... # print at precision of 2 decimal places ... print(f"{obj_id} {distance:.2f}") ... NeuronOfTheForebrain 0.28 ...
- Parameters:
text (
str)collection (
str)where (
Union[str,YAMLRoot,BaseModel,Dict,DuckDBSearchResult])kwargs
- Return type:
Iterator[Tuple[DuckDBSearchResult,Dict,float,Optional[Dict]]]- Returns:
tuple of object, distance, metadata
- set_collection(collection)
Set the current collection.
If this is set, then all subsequent operations will be performed on this collection, unless overridden.
This allows the following
>>> from curategpt.store import get_store >>> store = get_store("in_memory") >>> store.set_collection("people") >>> store.insert([{"name": "John", "age": 42}])
to be written in place of
>>> from curategpt.store import get_store >>> store = get_store("in_memory") >>> store.insert([{"name": "John", "age": 42}], collection="people")
- Parameters:
collection (
str)- Returns:
- set_collection_metadata(collection_name, metadata, **kwargs)
Set the metadata for a collection.
>>> from curategpt.store import get_store >>> from curategpt.store import Metadata >>> store = get_store("in_memory") >>> md = store.collection_metadata(collection) >>> md.venomx.id == "People" >>> md.venomx.embedding_model.name == "openai:" >>> store.set_collection_metadata("people", cm)
- Parameters:
collection_name (
Optional[str])- Returns:
- update(objs, collection=None, **kwargs)
Update an object or list of objects in the store.
- Parameters:
objs (
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult,List[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]])collection (
str)
- Returns:
- update_collection_metadata(collection_name, **kwargs)
Update the metadata for a collection.
- Parameters:
collection_name (
str)kwargs
- Return type:
- Returns:
- upsert(objs, collection=None, **kwargs)
Upsert an object or list of objects in the store.
- Parameters:
objs (
Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult,List[Union[YAMLRoot,BaseModel,Dict,DuckDBSearchResult]]])collection (
str)
- Returns:
-
collection:
- class curategpt.Extractor(schema_proxy=None, model_name=None, api_key=None, raise_error_if_unparsable=False)
Bases:
ABC-
api_key:
str= None
- deserialize(text, **kwargs)
Deserialize text into an annotated object
- Parameters:
text (
str)- Return type:
- Returns:
- abstractmethod extract(text, target_class, examples=None, **kwargs)
Schema-guided extraction
- Parameters:
text (
str)kwargs
- Return type:
- Returns:
- property model
Get the model
- Parameters:
model_name
- Returns:
-
model_name:
str= None
- property pydantic_root_model: BaseModel
-
raise_error_if_unparsable:
bool= False
-
schema_proxy:
SchemaProxy= None
- property schemaview: SchemaView
-
api_key: