curategpt.extract package
Submodules
curategpt.extract.basic_extractor module
Basic Extractor that is purely example driven.
- class curategpt.extract.basic_extractor.BasicExtractor(schema_proxy=None, model_name='gpt-4o', api_key=None, raise_error_if_unparsable=False, serialization_format='json')
Bases:
Extractor
Extractor that is purely example driven.
- deserialize(text, format=None, **kwargs)
Deserialize text into an annotated object
- Parameters:
text (
str
)- Return type:
- Returns:
- deserialize_yaml(text, multiple=False)
- Return type:
- extract(text, target_class, examples=None, background_text=None, rules=None, min_examples=1, **kwargs)
Schema-guided extraction
- Parameters:
text (
str
)kwargs
- Return type:
- Returns:
- model_config = {'protected_namespaces': ()}
-
model_name:
str
= 'gpt-4o'
-
serialization_format:
str
= 'json'
- serialize(ao)
- Return type:
str
curategpt.extract.extractor module
Retrieval Augmented Generation (RAG) Base Class.
- class curategpt.extract.extractor.AnnotatedObject(**data)
Bases:
BaseModel
Annotated object shadows a basic dictionary object
-
annotations:
Dict
[str
,Any
]
- as_single_object()
Return as a standard dictionary object.
Each annotation is prefixed with an underscore.
- Return type:
Dict
[str
,Any
]- Returns:
dictionary object
-
key_values:
Dict
[str
, AnnotatedObject]
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'annotations': FieldInfo(annotation=Dict[str, Any], required=False, default={}), 'key_values': FieldInfo(annotation=Dict[str, curategpt.extract.extractor.AnnotatedObject], required=False, default={}), 'object': FieldInfo(annotation=Any, required=False, default={})}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
-
object:
Any
- property text: str | None
Get the text annotation of the object.
- Returns:
-
annotations:
- class curategpt.extract.extractor.Extractor(schema_proxy=None, model_name=None, api_key=None, raise_error_if_unparsable=False)
Bases:
ABC
-
api_key:
str
= None
- deserialize(text, **kwargs)
Deserialize text into an annotated object
- Parameters:
text (
str
)- Return type:
- Returns:
- abstract extract(text, target_class, examples=None, **kwargs)
Schema-guided extraction
- Parameters:
text (
str
)kwargs
- Return type:
- Returns:
- property model
Get the model
- Parameters:
model_name
- Returns:
-
model_name:
str
= None
- property pydantic_root_model: BaseModel
-
raise_error_if_unparsable:
bool
= False
-
schema_proxy:
SchemaProxy
= None
- property schemaview: SchemaView
-
api_key:
curategpt.extract.openai_extractor module
Extractor that uses OpenAI functions.
- class curategpt.extract.openai_extractor.OpenAIExtractor(schema_proxy=None, model_name=None, api_key=None, raise_error_if_unparsable=False, max_tokens=3000, model='gpt-4')
Bases:
Extractor
Extractor that uses OpenAI functions.
- extract(text, target_class, examples=None, examples_as_functions=False, conversation=None, **kwargs)
Schema-guided extraction
- Parameters:
text (
str
)kwargs
- Return type:
- Returns:
- functions()
-
max_tokens:
int
= 3000
-
model:
str
= 'gpt-4'
curategpt.extract.recursive_extractor module
Basic Extractor that is purely example driven.
- class curategpt.extract.recursive_extractor.RecursiveExtractor(schema_proxy=None, model_name='gpt-3.5-turbo', api_key=None, raise_error_if_unparsable=False, serialization_format='json')
Bases:
Extractor
Extractor that recursively extracts objects from text.
See SPIRES
- deserialize(text)
Deserialize text into an annotated object
- Parameters:
text (
str
)- Return type:
- Returns:
- extract(text, target_class, examples=None, path=None, **kwargs)
Schema-guided extraction
- Parameters:
text (
str
)kwargs
- Return type:
- Returns:
-
model_name:
str
= 'gpt-3.5-turbo'
- partially_serialize(object, path)
- Return type:
str
-
serialization_format:
str
= 'json'
Module contents
CurateGPT Extractors.
These handle connections to (remote or local) LLMs, and can also extract structured objects from text.
Base class:
Extractor
- class curategpt.extract.AnnotatedObject(**data)
Bases:
BaseModel
Annotated object shadows a basic dictionary object
-
annotations:
Dict
[str
,Any
]
- as_single_object()
Return as a standard dictionary object.
Each annotation is prefixed with an underscore.
- Return type:
Dict
[str
,Any
]- Returns:
dictionary object
-
key_values:
Dict
[str
, AnnotatedObject]
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'annotations': FieldInfo(annotation=Dict[str, Any], required=False, default={}), 'key_values': FieldInfo(annotation=Dict[str, curategpt.extract.extractor.AnnotatedObject], required=False, default={}), 'object': FieldInfo(annotation=Any, required=False, default={})}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
-
object:
Any
- property text: str | None
Get the text annotation of the object.
- Returns:
-
annotations:
- class curategpt.extract.BasicExtractor(schema_proxy=None, model_name='gpt-4o', api_key=None, raise_error_if_unparsable=False, serialization_format='json')
Bases:
Extractor
Extractor that is purely example driven.
- deserialize(text, format=None, **kwargs)
Deserialize text into an annotated object
- Parameters:
text (
str
)- Return type:
- Returns:
- deserialize_yaml(text, multiple=False)
- Return type:
- extract(text, target_class, examples=None, background_text=None, rules=None, min_examples=1, **kwargs)
Schema-guided extraction
- Parameters:
text (
str
)kwargs
- Return type:
- Returns:
- model_config = {'protected_namespaces': ()}
-
model_name:
str
= 'gpt-4o'
-
serialization_format:
str
= 'json'
- serialize(ao)
- Return type:
str
- class curategpt.extract.Extractor(schema_proxy=None, model_name=None, api_key=None, raise_error_if_unparsable=False)
Bases:
ABC
-
api_key:
str
= None
- deserialize(text, **kwargs)
Deserialize text into an annotated object
- Parameters:
text (
str
)- Return type:
- Returns:
- abstract extract(text, target_class, examples=None, **kwargs)
Schema-guided extraction
- Parameters:
text (
str
)kwargs
- Return type:
- Returns:
- property model
Get the model
- Parameters:
model_name
- Returns:
-
model_name:
str
= None
- property pydantic_root_model: BaseModel
-
raise_error_if_unparsable:
bool
= False
-
schema_proxy:
SchemaProxy
= None
- property schemaview: SchemaView
-
api_key:
- class curategpt.extract.OpenAIExtractor(schema_proxy=None, model_name=None, api_key=None, raise_error_if_unparsable=False, max_tokens=3000, model='gpt-4')
Bases:
Extractor
Extractor that uses OpenAI functions.
- extract(text, target_class, examples=None, examples_as_functions=False, conversation=None, **kwargs)
Schema-guided extraction
- Parameters:
text (
str
)kwargs
- Return type:
- Returns:
- functions()
-
max_tokens:
int
= 3000
-
model:
str
= 'gpt-4'
- class curategpt.extract.RecursiveExtractor(schema_proxy=None, model_name='gpt-3.5-turbo', api_key=None, raise_error_if_unparsable=False, serialization_format='json')
Bases:
Extractor
Extractor that recursively extracts objects from text.
See SPIRES
- deserialize(text)
Deserialize text into an annotated object
- Parameters:
text (
str
)- Return type:
- Returns:
- extract(text, target_class, examples=None, path=None, **kwargs)
Schema-guided extraction
- Parameters:
text (
str
)kwargs
- Return type:
- Returns:
-
model_name:
str
= 'gpt-3.5-turbo'
- partially_serialize(object, path)
- Return type:
str
-
serialization_format:
str
= 'json'