Skip to content

Datasets

AIBECS.jl ships a handful of small pedagogical circulations inside the package (built from scratch in pure Julia, no download needed) and downloads larger data products on demand: ocean circulation matrices, dust deposition fields, topography, and a few others. This page catalogs everything the package can build or fetch.

Why downloads are deferred to first use

The downloaded datasets are large (the OCIM2 transport matrices are ~29 MB each, OCIM2_48L is ~554 MB, ETOPO is ~400 MB compressed), and bundling them would bloat every install for users who only need a subset. AIBECS instead uses DataDeps.jl to declare each dataset as a data dependency: a named record with a URL, a checksum, and a citation.

The first time you call e.g. OCIM2.load(), DataDeps:

  1. Checks the local DataDeps cache (~/.julia/datadeps/AIBECS-OCIM2_CTL_He/ by default) for a copy.

  2. If absent, prompts you to accept the licence and citation (auto-accepted on CI when ENV["DATADEPS_ALWAYS_ACCEPT"] = true), then downloads the file from the source listed below.

  3. Verifies the recorded checksum, refusing to use a corrupted file.

  4. Caches the file so subsequent loads are instant and offline-friendly.

This pattern (described in White et al., 2019) is the same mechanism used by MLDatasets.jl, WordNet.jl, and other data-heavy Julia packages. The benefit for AIBECS users: the package stays small and pure-code, the data lives at a stable URL with a citation, and any change to the upstream file is caught by the checksum mismatch instead of silently changing model output.

The toy / pedagogical circulations are different: they are built in memory from a few constants and the helpers in CirculationGeneration.jl, so they add no install-time cost and need no network access.

Ocean circulations

Built from scratch (bundled in AIBECS)

These pedagogical circulations are constructed each time Module.load() is called, using OceanGrids plus the T_advection / T_diffusion helpers in src/CirculationGeneration.jl. Nothing is downloaded or cached.

ModuleLayoutGrid sizeCitationSourceSize (MB)
TwoBoxModelsurface + deep1×1×2Sarmiento & Gruber (2006)bundled0
Archer_etal_20003-box (HL surface, LL surface, deep) on a 6-cell grid2×1×3Archer et al. (2000)bundled0
Primeau_2x2x2shoebox (5 wet boxes, 3 dry)2×2×2Primeau, Intro2TransportOperatorsbundled0
Haine_and_Hall_20259-box (3 latitudes × 3 depths)3×1×3Haine & Hall (2002); Haine et al. (2025)bundled0

Downloaded (data-based circulations)

Five families of transport matrices and grids are available. All files are JLD2 (or .tar.gz for OCIM2_48L) generated from the upstream MATLAB distributions by briochemc/OceanCirculations.

ModuleVariantGrid sizeCitationSourceSize (MB)
OCIM0(single)180×90×24DeVries & Primeau (2011); Primeau et al. (2013)link11
OCIM1CTL180×91×24DeVries (2014)link28
OCIM2CTL_He (default)180×91×24DeVries & Holzer (2019)link29
OCIM2CTL_noHe180×91×24DeVries & Holzer (2019)link29
OCIM2KiHIGH_He180×91×24DeVries & Holzer (2019)link29
OCIM2KiHIGH_noHe180×91×24DeVries & Holzer (2019)link28
OCIM2KiLOW_He180×91×24DeVries & Holzer (2019)link29
OCIM2KiLOW_noHe180×91×24DeVries & Holzer (2019)link29
OCIM2KvHIGH_He180×91×24DeVries & Holzer (2019)link29
OCIM2KvHIGH_KiHIGH_noHe180×91×24DeVries & Holzer (2019)link28
OCIM2KvHIGH_KiLOW_He180×91×24DeVries & Holzer (2019)link29
OCIM2KvHIGH_KiLOW_noHe180×91×24DeVries & Holzer (2019)link29
OCIM2KvHIGH_noHe180×91×24DeVries & Holzer (2019)link29
OCIM2_48Lbase180×91×48Holzer, DeVries & de Lavergne (2021)link554
OCCA(single)180×80×10Forget (2010)link5

Loading any OCIM matrix requires using JLD2 (which activates the AIBECSJLD2Ext extension); loading OCIM2_48L additionally requires using MAT, NCDatasets.

Other datasets

Aeolian deposition, topography, river discharge, groundwater discharge, and the AWESOME-OCIM toolbox. These are hosted by their original maintainers (except Chien, which lives on Zenodo).

ModuleVariantWhat it isCitationSourceSize (MB)
AeolianSourcesChien (default)2°×2° seasonal aerosol deposition (fires, biofuels, dust, sea salt, biogenics, volcanoes, fossil fuels)Chien et al. (2016)link1
AeolianSourcesKokannual dust deposition by source region (DustCOMM)Kok et al. (2021)link6
ETOPObedrock1-arc-min global relief, bedrock surface (compressed)Amante & Eakins (2009)link402
ETOPOice1-arc-min global relief, ice-surface (compressed)Amante & Eakins (2009)link395
GroundWaters(single)coastal fresh-groundwater discharge shapefileLuijendijk et al. (2020); PANGAEA datasetlink32
AOmasterAWESOME-OCIM MATLAB toolbox (source archive)John et al. (2020)link123

Loading these typically requires the matching extension dependencies: AeolianSources needs NCDatasets, ETOPO needs Distances, NCDatasets, GroundWaters needs Shapefile, DataFrames.

Cache layout

DataDeps stores files under ~/.julia/datadeps/<DataDepName>/ by default. You can override the location by setting ENV["DATADEPS_LOAD_PATH"] before loading AIBECS. The DataDep names this package registers are:

  • AIBECS-OCIM0.1, AIBECS-OCIM1_CTL, AIBECS-OCIM2_<variant> (one per OCIM2 variant), AIBECS-OCIM2_48L, AIBECS-OCCA

  • AIBECS-Chien_etal_2016, AIBECS-Kok_etal_2021

  • ETOPO_bedrock, ETOPO_ice

  • groundwater_discharge

  • AWESOME-OCIM

To force a re-download (e.g. after a checksum mismatch or to test a URL change), delete the corresponding directory and call Module.load() again.