Skip to content

Configure

  1. Make a directory for your ingest, using the source of the data as the name:

    mkdir src/monarch_ingest/ingests/<source> 
    
    For example:
    mkdir src/monarch_ingest/ingests/ncbi
    

  2. Add data sources to src/monarch_ingest/download.yaml:

    # <source>
    -
      url: https://<source>.com/downloads/somedata.txt 
      local_name: data/<source>/somedata.txt
      tag: <source>_<ingest>                             
    
    For example:
    # mgi
    -
      url: http://www.informatics.jax.org/downloads/reports/MRK_Reference.rpt
      local_name: data/mgi/MRK_Reference.rpt
      tag: mgi_publication_to_gene   
    

    Note: You can now use ingest download --tags <tag> or ingest download --all, and your data will be downloaded to the appropriate subdir in data/

  3. Add your ingest to src/monarch_ingest/ingests.yaml:

    <ingest_name>:
      config: 'ingests/<source>/<ingest>.yaml
    
    For example:
    ncbi_gene:
      config: 'ingests/ncbi/gene.yaml'
    

  4. Copy the template:

    cp ingest_template/* src/monarch_ingest/ingests/<source>
    

  5. Edit metadata.yaml:

    • Update the description, rights link, url, etc and then add your source_file
  6. Edit the source file yaml

    • Match the columns or required fields with what's available in the file to be ingested
      • If it's an ingest that exists in Dipper, check out what Dipper does.
      • Check the Biolink Model documentation to look at what you can capture
      • If what we need from an ingest can't be captured in the model yet, make a new Biolink issue
    • Set the header properties
      • If there is no header at all, set header: False
      • If there are comment lines before the header, count them and set skip_lines: {n}

--
Next step: Adding documentation