curategpt.extract package

Submodules

curategpt.extract.basic_extractor module

Basic Extractor that is purely example driven.

class curategpt.extract.basic_extractor.BasicExtractor(schema_proxy=None, model_name='gpt-4o', api_key=None, raise_error_if_unparsable=False, serialization_format='json')

Bases: Extractor

Extractor that is purely example driven.

deserialize(text, format=None, **kwargs)

Deserialize text into an annotated object

Parameters:: text (str)
Return type:: AnnotatedObject
Returns:

deserialize_yaml(text, multiple=False)

Return type:: AnnotatedObject

extract(text, target_class, examples=None, background_text=None, rules=None, min_examples=1, **kwargs)

Schema-guided extraction

Parameters:

text (str)
kwargs

Return type:

AnnotatedObject

Returns:

model_config = {'protected_namespaces': ()}

model_name: str = 'gpt-4o'

serialization_format: str = 'json'

serialize(ao)

Return type:: str

curategpt.extract.extractor module

Retrieval Augmented Generation (RAG) Base Class.

class curategpt.extract.extractor.AnnotatedObject(**data)

Bases: BaseModel

Annotated object shadows a basic dictionary object

annotations: Dict[str, Any]

as_single_object()

Return as a standard dictionary object.

Each annotation is prefixed with an underscore.

Return type:: Dict[str, Any]
Returns:: dictionary object

key_values: Dict[str, AnnotatedObject]

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

object: Any

property text: str | None

Get the text annotation of the object.

Returns:

class curategpt.extract.extractor.Extractor(schema_proxy=None, model_name=None, api_key=None, raise_error_if_unparsable=False)

Bases: ABC

api_key: str = None

deserialize(text, **kwargs)

Deserialize text into an annotated object

Parameters:: text (str)
Return type:: AnnotatedObject
Returns:

abstractmethod extract(text, target_class, examples=None, **kwargs)

Schema-guided extraction

Parameters:

text (str)
kwargs

Return type:

AnnotatedObject

Returns:

property model

Get the model

Parameters:: model_name
Returns:

model_name: str = None

property pydantic_root_model: BaseModel

raise_error_if_unparsable: bool = False

schema_proxy: SchemaProxy = None

property schemaview: SchemaView

curategpt.extract.openai_extractor module

Extractor that uses OpenAI functions.

class curategpt.extract.openai_extractor.OpenAIExtractor(schema_proxy=None, model_name=None, api_key=None, raise_error_if_unparsable=False, max_tokens=3000, model='gpt-4')

Bases: Extractor

Extractor that uses OpenAI functions.

extract(text, target_class, examples=None, examples_as_functions=False, conversation=None, **kwargs)

Schema-guided extraction

Parameters:

text (str)
kwargs

Return type:

AnnotatedObject

Returns:

functions()

max_tokens: int = 3000

model: str = 'gpt-4'

curategpt.extract.recursive_extractor module

Basic Extractor that is purely example driven.

class curategpt.extract.recursive_extractor.RecursiveExtractor(schema_proxy=None, model_name='gpt-3.5-turbo', api_key=None, raise_error_if_unparsable=False, serialization_format='json')

Bases: Extractor

Extractor that recursively extracts objects from text.

See SPIRES

deserialize(text)

Deserialize text into an annotated object

Parameters:: text (str)
Return type:: AnnotatedObject
Returns:

extract(text, target_class, examples=None, path=None, **kwargs)

Schema-guided extraction

Parameters:

text (str)
kwargs

Return type:

AnnotatedObject

Returns:

model_name: str = 'gpt-3.5-turbo'

partially_serialize(object, path)

Return type:: str

serialization_format: str = 'json'

Module contents

CurateGPT Extractors.

These handle connections to (remote or local) LLMs, and can also extract structured objects from text.

Base class: Extractor

class curategpt.extract.AnnotatedObject(**data)

Bases: BaseModel

Annotated object shadows a basic dictionary object

annotations: Dict[str, Any]

as_single_object()

Return as a standard dictionary object.

Each annotation is prefixed with an underscore.

Return type:: Dict[str, Any]
Returns:: dictionary object

key_values: Dict[str, AnnotatedObject]

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

object: Any

property text: str | None

Get the text annotation of the object.

Returns:

class curategpt.extract.BasicExtractor(schema_proxy=None, model_name='gpt-4o', api_key=None, raise_error_if_unparsable=False, serialization_format='json')

Bases: Extractor

Extractor that is purely example driven.

deserialize(text, format=None, **kwargs)

Deserialize text into an annotated object

Parameters:: text (str)
Return type:: AnnotatedObject
Returns:

deserialize_yaml(text, multiple=False)

Return type:: AnnotatedObject

extract(text, target_class, examples=None, background_text=None, rules=None, min_examples=1, **kwargs)

Schema-guided extraction

Parameters:

text (str)
kwargs

Return type:

AnnotatedObject

Returns:

model_config = {'protected_namespaces': ()}

model_name: str = 'gpt-4o'

serialization_format: str = 'json'

serialize(ao)

Return type:: str

class curategpt.extract.Extractor(schema_proxy=None, model_name=None, api_key=None, raise_error_if_unparsable=False)

Bases: ABC

api_key: str = None

deserialize(text, **kwargs)

Deserialize text into an annotated object

Parameters:: text (str)
Return type:: AnnotatedObject
Returns:

abstractmethod extract(text, target_class, examples=None, **kwargs)

Schema-guided extraction

Parameters:

text (str)
kwargs

Return type:

AnnotatedObject

Returns:

property model

Get the model

Parameters:: model_name
Returns:

model_name: str = None

property pydantic_root_model: BaseModel

raise_error_if_unparsable: bool = False

schema_proxy: SchemaProxy = None

property schemaview: SchemaView

class curategpt.extract.OpenAIExtractor(schema_proxy=None, model_name=None, api_key=None, raise_error_if_unparsable=False, max_tokens=3000, model='gpt-4')

Bases: Extractor

Extractor that uses OpenAI functions.

extract(text, target_class, examples=None, examples_as_functions=False, conversation=None, **kwargs)

Schema-guided extraction

Parameters:

text (str)
kwargs

Return type:

AnnotatedObject

Returns:

functions()

max_tokens: int = 3000

model: str = 'gpt-4'

class curategpt.extract.RecursiveExtractor(schema_proxy=None, model_name='gpt-3.5-turbo', api_key=None, raise_error_if_unparsable=False, serialization_format='json')

Bases: Extractor

Extractor that recursively extracts objects from text.

See SPIRES

deserialize(text)

Deserialize text into an annotated object

Parameters:: text (str)
Return type:: AnnotatedObject
Returns:

extract(text, target_class, examples=None, path=None, **kwargs)

Schema-guided extraction

Parameters:

text (str)
kwargs

Return type:

AnnotatedObject

Returns:

model_name: str = 'gpt-3.5-turbo'

partially_serialize(object, path)

Return type:: str

serialization_format: str = 'json'