Skip to content

Subset: Preprocessing-Cleaning-Labeling

The questions in this section are intended to provide dataset consumers with the information they need to determine whether the “raw” data has been processed in ways that are compatible with their chosen tasks.

URI: Preprocessing-Cleaning-Labeling

Identifier and Mapping Information

Schema Source

  • from schema: https://w3id.org/bridge2ai/data-sheets-schema

Classes in subset

Class Description
CleaningStrategy Was any cleaning of the data done (e
LabelingStrategy Was any preprocessing/cleaning/labeling of the data done (e
PreprocessingStrategy Was any preprocessing of the data done (e
RawData Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data...

CleaningStrategy

Was any cleaning of the data done (e.g., removal of instances, processing of missing values)?

LabelingStrategy

Was any preprocessing/cleaning/labeling of the data done (e.g., part-of-speech tagging)?

PreprocessingStrategy

Was any preprocessing of the data done (e.g., discretization or bucketing, tokenization, SIFT feature extraction)?

RawData

Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data (e.g., to support unanticipated future uses)? If so, please provide a link or other access point to the “raw” data.