babelon package

Submodules

babelon.babelon_io module

babelon.io.

babelon.babelon_io.convert_file(input_path, output, drop_unknown_columns=True, output_format=None)

Convert a file from one format to another.

Parameters:
  • input_path (str) – The path to the input babelon tsv file

  • output (TextIO) – The path to the output file. If none is given, will default to using stdout.

  • drop_unknown_columns (bool) – If true, columns unknown to Babelon format are dropped prior to processing.

  • output_format (Optional[str]) – The format to which the SSSOM TSV should be converted.

Return type:

None

babelon.babelon_io.parse_file(input_path, output_path)

Parse a Babelon metadata file and write to a table.

Return type:

None

Args:

input_path (str): The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml output_path (TextIO): The path to the output file.

Raises:

ValueError: [description]

babelon.babelon_io.to_babelon_linkml_document(bdf)

Load a LinkML YAML representation from a BabelonDataFrame.

babelon.babelon_io.to_json(bdf)

Convert a mapping set dataframe to a JSON object.

Return type:

JsonObj

babelon.babelon_io.to_owl_graph(bdf)

Convert a mapping set dataframe to OWL in an RDF graph.

Return type:

Graph

babelon.babelon_io.to_rdf_graph(bdf)

Convert a mapping set dataframe to an RDF graph.

Return type:

Graph

babelon.babelon_io.write_json(bdf, output, serialisation='json')

Write a mapping set dataframe to the file as JSON.

Return type:

None

Args:

bdf (BabelonDataFrame): The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml output (TextIO): The path or stream of the output. serialisation (str): the target serialisation (must be ‘json’)

Raises:

ValueError: [description]

babelon.babelon_io.write_owl(bdf, output, serialisation='owl')

Write a mapping set dataframe to the file as OWL.

Return type:

None

Args:

bdf (BabelonDataFrame): The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml output (TextIO): The path or stream of the output. serialisation (str): the target serialisation (must be ‘json’)

Raises:

ValueError: [description]

babelon.cli module

Command line interface for Babelon.

babelon.constants module

Constants for babelon toolkit.

babelon.dataclasses module

class babelon.dataclasses.EntityReference(v)

Bases: Uriorcurie

A reference to a mapped entity. This is represented internally as a string, and as a resource in RDF

type_class_curie = 'rdfs:Resource'
type_class_uri = rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#Resource')
type_model_uri = rdflib.term.URIRef('https://w3id.org/babelon/EntityReference')
type_name = 'EntityReference'
class babelon.dataclasses.Profile(translations=<factory>, translation_provider=None, profile_id=None, profile_version=None, comment=None, **_kwargs)

Bases: YAMLRoot

Represents a set of translation that together compose a language profile.

class_class_curie: ClassVar[str] = 'babelon:Profile'
class_class_uri: ClassVar[URIRef] = rdflib.term.URIRef('https://w3id.org/babelon/Profile')
class_model_uri: ClassVar[URIRef] = rdflib.term.URIRef('https://w3id.org/babelon/Profile')
class_name: ClassVar[str] = 'profile'
comment: Optional[str] = None
profile_id: Optional[str] = None
profile_version: Optional[str] = None
translation_provider: Optional[str] = None
translations: Union[dict, Translation, List[Union[dict, Translation]], None]
class babelon.dataclasses.Translation(subject_id=None, predicate_id=None, source_value=None, source_language=None, translation_value=None, translation_language=None, source_version=None, translation_type=None, translator=None, translator_expertise=None, translation_date=None, translation_confidence=None, translation_precision=None, translation_status=None, source=None, comment=None, **_kwargs)

Bases: YAMLRoot

Represents and individual translation

class_class_curie: ClassVar[str] = 'owl:Axiom'
class_class_uri: ClassVar[URIRef] = rdflib.term.URIRef('http://www.w3.org/2002/07/owl#Axiom')
class_model_uri: ClassVar[URIRef] = rdflib.term.URIRef('https://w3id.org/babelon/Translation')
class_name: ClassVar[str] = 'translation'
comment: Optional[str] = None
predicate_id: Union[str, EntityReference] = None
source: Optional[str] = None
source_language: str = None
source_value: str = None
source_version: Optional[str] = None
subject_id: Union[str, EntityReference] = None
translation_confidence: Optional[float] = None
translation_date: Optional[str] = None
translation_language: Optional[str] = None
translation_precision: Union[str, TranslationPrecisionEnum, None] = None
translation_status: Union[str, TranslationStatusEnum, None] = None
translation_type: Union[str, TranslationTypeEnum, None] = None
translation_value: Optional[str] = None
translator: Optional[str] = None
translator_expertise: Union[str, TranslatorExpertiseEnum, None] = None
class babelon.dataclasses.TranslationPrecisionEnum(code)

Bases: EnumDefinitionImpl

BROADER = PermissibleValue({   'text': 'BROADER',   'description': 'The translation value has a somewhat broader meaning than the source value.' })
CLOSE = PermissibleValue({   'text': 'CLOSE',   'description': 'The translation value is close in meaning to the source value, but not exact.' })
EXACT = PermissibleValue({'text': 'EXACT', 'description': 'The translation is exact.'})
NARROWER = PermissibleValue({   'text': 'NARROWER',   'description': 'The translation value has a somewhat narrower meaning than the source value.' })
class babelon.dataclasses.TranslationStatusEnum(code)

Bases: EnumDefinitionImpl

CANDIDATE = PermissibleValue({   'text': 'CANDIDATE',   'description': ('The translation has been suggested from an entity (algorithm, person) '      'outside the core team managing the translation.') })
NOT_TRANSLATED = PermissibleValue({'text': 'NOT_TRANSLATED', 'description': 'This translation is incomplete.'})
OFFICIAL = PermissibleValue({   'text': 'OFFICIAL',   'description': ('The translation has been accepted by the core team managing the language '      'profile.') })
UNDER_REVIEW = PermissibleValue({   'text': 'UNDER_REVIEW',   'description': ('The translation has been suggested from an entity (algorithm, person) inside '      'the core team managing the translation, but not yet officially ratified.') })
class babelon.dataclasses.TranslationTypeEnum(code)

Bases: EnumDefinitionImpl

AUGMENTATION = PermissibleValue({   'text': 'AUGMENTATION',   'description': ('The record corresponds to an additional language specific terminological '      'element without a corresponding element in the source language.') })
CORRECTION = PermissibleValue({   'text': 'CORRECTION',   'description': ('The record corresponds to a translation of a source value into a translation '      'value, but rather than being an exact translation, it suggests a change to '      'the original source value.') })
TRANSLATION = PermissibleValue({   'text': 'TRANSLATION',   'description': ('The record corresponds to an actual translation of a source value into a '      'translation value.') })
class babelon.dataclasses.TranslatorExpertiseEnum(code)

Bases: EnumDefinitionImpl

ALGORITHM = PermissibleValue({'text': 'ALGORITHM', 'description': 'The translator is a machine, not a person.'})
DOMAIN_EXPERT = PermissibleValue({   'text': 'DOMAIN_EXPERT',   'description': ('The translator is an expert of the domain of the ontology, for example an '      'expert in anatomy when translating terms from an anatomy ontology such as '      'Uberon.') })
DOMAIN_STUDENT = PermissibleValue({   'text': 'DOMAIN_STUDENT',   'description': ('The translator is a student of the domain of the ontology, for example a '      'student of anatomy, when translating terms from an anatomy ontology such as '      'Uberon.') })
LAYPERSON = PermissibleValue({   'text': 'LAYPERSON',   'description': ('The translator is an interested lay person with no specific knowledge of the '      'domain.') })
PROFESSIONAL_TRANSLATOR = PermissibleValue({   'text': 'PROFESSIONAL_TRANSLATOR',   'description': 'The translator is a professional translator by trade.' })
class babelon.dataclasses.slots

Bases: object

comment = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/comment'), name='comment', curie='babelon:comment', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/comment'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
predicate_id = Slot(uri=rdflib.term.URIRef('http://www.w3.org/2002/07/owl#annotatedProperty'), name='predicate_id', curie='owl:annotatedProperty', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/predicate_id'), domain=None, range=typing.Union[str, babelon.dataclasses.EntityReference, NoneType], mappings=None, pattern=None)
profile_id = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/profile_id'), name='profile_id', curie='babelon:profile_id', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/profile_id'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
profile_version = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/profile_version'), name='profile_version', curie='babelon:profile_version', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/profile_version'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
source = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/source'), name='source', curie='babelon:source', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/source'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
source_language = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/source_language'), name='source_language', curie='babelon:source_language', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/source_language'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
source_value = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/source_value'), name='source_value', curie='babelon:source_value', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/source_value'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
source_version = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/source_version'), name='source_version', curie='babelon:source_version', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/source_version'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
subject_id = Slot(uri=rdflib.term.URIRef('http://www.w3.org/2002/07/owl#annotatedSource'), name='subject_id', curie='owl:annotatedSource', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/subject_id'), domain=None, range=typing.Union[str, babelon.dataclasses.EntityReference, NoneType], mappings=None, pattern=None)
translation_confidence = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_confidence'), name='translation_confidence', curie='babelon:translation_confidence', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_confidence'), domain=None, range=typing.Optional[float], mappings=None, pattern=None)
translation_date = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_date'), name='translation_date', curie='babelon:translation_date', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_date'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
translation_language = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_language'), name='translation_language', curie='babelon:translation_language', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_language'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
translation_precision = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_precision'), name='translation_precision', curie='babelon:translation_precision', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_precision'), domain=None, range=typing.Union[str, ForwardRef('TranslationPrecisionEnum'), NoneType], mappings=None, pattern=None)
translation_predicate_id = Slot(uri=rdflib.term.URIRef('http://www.w3.org/2002/07/owl#annotatedProperty'), name='translation_predicate_id', curie='owl:annotatedProperty', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_predicate_id'), domain=<class 'babelon.dataclasses.Translation'>, range=typing.Union[str, babelon.dataclasses.EntityReference], mappings=None, pattern=None)
translation_provider = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_provider'), name='translation_provider', curie='babelon:translation_provider', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_provider'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
translation_source_language = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/source_language'), name='translation_source_language', curie='babelon:source_language', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_source_language'), domain=<class 'babelon.dataclasses.Translation'>, range=<class 'str'>, mappings=None, pattern=None)
translation_source_value = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/source_value'), name='translation_source_value', curie='babelon:source_value', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_source_value'), domain=<class 'babelon.dataclasses.Translation'>, range=<class 'str'>, mappings=None, pattern=None)
translation_status = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_status'), name='translation_status', curie='babelon:translation_status', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_status'), domain=None, range=typing.Union[str, ForwardRef('TranslationStatusEnum'), NoneType], mappings=None, pattern=None)
translation_subject_id = Slot(uri=rdflib.term.URIRef('http://www.w3.org/2002/07/owl#annotatedSource'), name='translation_subject_id', curie='owl:annotatedSource', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_subject_id'), domain=<class 'babelon.dataclasses.Translation'>, range=typing.Union[str, babelon.dataclasses.EntityReference], mappings=None, pattern=None)
translation_type = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_type'), name='translation_type', curie='babelon:translation_type', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_type'), domain=None, range=typing.Union[str, ForwardRef('TranslationTypeEnum'), NoneType], mappings=None, pattern=None)
translation_value = Slot(uri=rdflib.term.URIRef('http://www.w3.org/2002/07/owl#annotatedTarget'), name='translation_value', curie='owl:annotatedTarget', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translation_value'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
translations = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/translations'), name='translations', curie='babelon:translations', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translations'), domain=None, range=typing.Union[dict, babelon.dataclasses.Translation, typing.List[typing.Union[dict, babelon.dataclasses.Translation]], NoneType], mappings=None, pattern=None)
translator = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/translator'), name='translator', curie='babelon:translator', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translator'), domain=None, range=typing.Optional[str], mappings=None, pattern=None)
translator_expertise = Slot(uri=rdflib.term.URIRef('https://w3id.org/babelon/translator_expertise'), name='translator_expertise', curie='babelon:translator_expertise', model_uri=rdflib.term.URIRef('https://w3id.org/babelon/translator_expertise'), domain=None, range=typing.Union[str, ForwardRef('TranslatorExpertiseEnum'), NoneType], mappings=None, pattern=None)

babelon.translate module

Translate Babelon profiles.

class babelon.translate.OpenAITranslator(model='gpt-4-turbo-preview')

Bases: Translator

A specific translator class that uses GPT-4 for translation.

model_name()

Return the unique name of the model.

translate(text_to_translate, language_code)

Translate text using OpenAI’s GPT-4 API (hypothetical).

Args: text_to_translate (str): The text to be translated. language_code (str): The target language code (e.g., ‘de’ for German).

Returns: str: The translated text.

class babelon.translate.Translator

Bases: object

A generic translator class.

model_name()

Return the unique name of the model.

Raises:

NotImplementedError: If the method is not implemented in the subclass

translate(text, target_language)

Translate the provided text into the target language.

Args:

text (str): The text to be translated. target_language (str): The language to translate the text into.

Raises:

NotImplementedError: If the method is not implemented in the subclass.

babelon.translate.get_translator_model(model='gpt-4')

Instantiate translator model based on string.

Args:

model (str): The model to be instatiated.

Raises:

ValueError: If the model does not exist.

babelon.translate.prepare_translation_for_ontology(ontology, language_code, df_babelon, terms, fields, include_not_translated=False, update_translation_status=True)

Prepare a babelon translation table for an ontology.

babelon.translate.translate_profile(babelon_df, language_code='en', update_existing=False, model='gpt-4')

Iterate through DataFrame rows and translate values.

babelon.translation_profile module

Translation Profile.

babelon.translation_profile.statistics_translation_profile(translation_profile)

Take as an input a babelon profile (TSV) and returns some basic stats. :rtype: None

number of translations by source_language, target_language number of translations by source_language, target_language, predicate_id number of translations by source_language, target_language, translation_status

Args:

translation_profile (Path): translation profile

babelon.translation_profile.table_print(title, data)

Print grouped translation data.

Args:

title (str): Table title data (pd.DataFrame): Translation groupped data

babelon.utils module

Utility methods for babelon processing.

class babelon.utils.BabelonDataFrame(df, converter=<factory>)

Bases: object

A collection of mappings represented as a DataFrame, together with additional metadata.

converter: Converter
df: DataFrame
property prefix_map

Get a simple, bijective prefix map.

classmethod with_converter(converter, df)

Instantiate with a converter instead of a vanilla prefix map.

Return type:

BabelonDataFrame

babelon.utils.assemble_xliff_file(translation_units)
babelon.utils.assemble_xliff_translation_unit(identifier, id_normalised, label, element, value)
babelon.utils.drop_unknown_columns_babelon(df)

Sort a babelon Dataframe according to key columns.

babelon.utils.generate_translation_units(identifier, label, definition, synonyms)
babelon.utils.get_converter()

Get default SSSOM converter.

babelon.utils.parse_babelon(input_path, drop_unknown_columns=False)

Parse a babelon TSV file into a BabelonDataFrame.

babelon.utils.raise_for_bad_path(file_path)

Throw exception if file path is invalid.

Return type:

None

Args:

file_path: The file path or URL to be validated.

Raises:

FileNotFoundError: If the provided file path is not a valid file or URL.

babelon.utils.sort_babelon(df)

Sort a babelon Dataframe according to key columns.

Module contents

babelon package.