Skip to content

Datasets

AIBECS.jl ships a handful of small pedagogical circulations inside the package (built from scratch in pure Julia, no download needed) and downloads larger data products on demand: ocean circulation matrices, dust deposition fields, topography, and a few others. This page catalogs everything the package can build or fetch.

Why downloads are deferred to first use

The downloaded datasets are large (the OCIM2 transport matrices are ~29 MB each, OCIM2_48L is ~554 MB, ETOPO is ~400 MB compressed), and bundling them would bloat every install for users who only need a subset. AIBECS instead uses DataDeps.jl to declare each dataset as a data dependency: a named record with a URL, a checksum, and a citation.

The first time you call e.g. OCIM2.load(), DataDeps:

  1. Checks the local DataDeps cache (~/.julia/datadeps/AIBECS-OCIM2_CTL_He/ by default) for a copy.

  2. If absent, prompts you to accept the licence and citation (auto-accepted on CI when ENV["DATADEPS_ALWAYS_ACCEPT"] = true), then downloads the file from the source listed below.

  3. Verifies the recorded checksum, refusing to use a corrupted file.

  4. Caches the file so subsequent loads are instant and offline-friendly.

This pattern (described in White et al., 2019) is the same mechanism used by MLDatasets.jl, WordNet.jl, and other data-heavy Julia packages. The benefit for AIBECS users: the package stays small and pure-code, the data lives at a stable URL with a citation, and any change to the upstream file is caught by the checksum mismatch instead of silently changing model output.

The toy / pedagogical circulations are different: they are built in memory from a few constants and the helpers in CirculationGeneration.jl, so they add no install-time cost and need no network access.

Ocean circulations

Built from scratch (bundled in AIBECS)

These pedagogical circulations are constructed each time Module.load() is called, using OceanGrids plus the T_advection / T_diffusion helpers in src/CirculationGeneration.jl. Nothing is downloaded or cached.

ModuleLayoutGrid sizeCitationSourceSize (MB)
TwoBoxModelsurface + deep1×1×2Sarmiento & Gruber (2006)bundled0
Archer_etal_20003-box (HL surface, LL surface, deep) on a 6-cell grid2×1×3Archer et al. (2000)bundled0
Primeau_2x2x2shoebox (5 wet boxes, 3 dry)2×2×2Primeau, Intro2TransportOperatorsbundled0
Haine_and_Hall_20259-box (3 latitudes × 3 depths)3×1×3Haine & Hall (2002); Haine et al. (2025)bundled0

Downloaded (data-based circulations)

Five families of transport matrices and grids are available. All files are JLD2 (or .tar.gz for OCIM2_48L) generated from the upstream MATLAB distributions by briochemc/OceanCirculations.

ModuleVariantGrid sizeCitationSourceSize (MB)
OCIM0(single)180×90×24DeVries & Primeau (2011); Primeau et al. (2013)link11
OCIM1CTL180×91×24DeVries (2014)link28
OCIM2CTL_He (default)180×91×24DeVries & Holzer (2019)link29
OCIM2CTL_noHe180×91×24DeVries & Holzer (2019)link29
OCIM2KiHIGH_He180×91×24DeVries & Holzer (2019)link29
OCIM2KiHIGH_noHe180×91×24DeVries & Holzer (2019)link28
OCIM2KiLOW_He180×91×24DeVries & Holzer (2019)link29
OCIM2KiLOW_noHe180×91×24DeVries & Holzer (2019)link29
OCIM2KvHIGH_He180×91×24DeVries & Holzer (2019)link29
OCIM2KvHIGH_KiHIGH_noHe180×91×24DeVries & Holzer (2019)link28
OCIM2KvHIGH_KiLOW_He180×91×24DeVries & Holzer (2019)link29
OCIM2KvHIGH_KiLOW_noHe180×91×24DeVries & Holzer (2019)link29
OCIM2KvHIGH_noHe180×91×24DeVries & Holzer (2019)link29
OCIM2_48Lbase180×91×48Holzer, DeVries & de Lavergne (2021)link554
OCCA(single)180×80×10Forget (2010)link5

Loading any OCIM matrix requires using JLD2 (which activates the AIBECSJLD2Ext extension); loading OCIM2_48L additionally requires using MAT, NCDatasets.

Other datasets

Aeolian deposition, topography, river discharge, groundwater discharge, and the AWESOME-OCIM toolbox. These are hosted by their original maintainers (except Chien, which lives on Zenodo).

ModuleVariantWhat it isCitationSourceSize (MB)
AeolianSourcesChien (default)2°×2° seasonal aerosol deposition (fires, biofuels, dust, sea salt, biogenics, volcanoes, fossil fuels)Chien et al. (2016)link1
AeolianSourcesKokannual dust deposition by source region (DustCOMM)Kok et al. (2021)link6
ETOPObedrock1-arc-min global relief, bedrock surface (compressed)Amante & Eakins (2009)link402
ETOPOice1-arc-min global relief, ice-surface (compressed)Amante & Eakins (2009)link395
GroundWaters(single)coastal fresh-groundwater discharge shapefileLuijendijk et al. (2020); PANGAEA datasetlink32
AOmasterAWESOME-OCIM MATLAB toolbox (source archive)John et al. (2020)link123

Loading these typically requires the matching extension dependencies: AeolianSources needs NCDatasets, ETOPO needs Distances, NCDatasets, GroundWaters needs Shapefile, DataFrames.

Cache layout

DataDeps stores files under ~/.julia/datadeps/<DataDepName>/ by default. You can override the location by setting ENV["DATADEPS_LOAD_PATH"] before loading AIBECS. The DataDep names this package registers are:

  • AIBECS-OCIM0.1, AIBECS-OCIM1_CTL, AIBECS-OCIM2_<variant> (one per OCIM2 variant), AIBECS-OCIM2_48L, AIBECS-OCCA

  • AIBECS-Chien_etal_2016, AIBECS-Kok_etal_2021

  • ETOPO_bedrock, ETOPO_ice

  • groundwater_discharge

  • AWESOME-OCIM

To force a re-download (e.g. after a checksum mismatch or to test a URL change), delete the corresponding directory and call Module.load() again.

AWESOME OCIM toolbox

The AWESOME OCIM (AO) is a MATLAB toolbox by John, Liang, Weber, DeVries, Primeau, Moore, Holzer, and Mahowald (2020) that bundles the OCIM1 transport matrix alongside auxiliary GEOTRACES, WOA, and Weber–John datasets. AIBECS does not expose AO contents directly but can fetch and unpack the toolbox so you can browse the files yourself:

julia
using AIBECS
AO_path = AO.download_and_unpack()
# AO_path/OCIM1/  — MATLAB transport matrices and grid info
# AO_path/data/   — bundled observational fields (GEOTRACES, WOA, Weber–John)
# AO_path/util/   — MATLAB helper scripts

See the AO docstring for the citation. The toolbox is distributed as a GitHub archive zip, so downloads depend on GitHub's availability; mirror it to your own storage if you plan heavy use.