Benchmarks

Here we benchmark the model performance in two Architectures. The number of individuals used in the benchmark are (2^10, 2^12, 2^14, 2^15). And we also use different grid resolutions in 2-Dimensional and 3-Dimensional model setup.

0-Dimensional model

This is a benchmark of a simple 0-Dimensional model setup without advection of Eulerian tracers. However, the advection of individuals still take the same amount of time whether the velocity field is provided or not.

PlanktonIndividuals v0.7.5
Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 56 × Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake-avx512)
  GPU: Quadro GV100 (sm_70, 32.000 GiB available)
  CUDA runtime 12.9, artifact installation
  CUDA driver 565.57.1 for 12.7
ArchNminmedianmeanmaxmemoryallocssamples
CPU10242.536 ms2.629 ms2.696 ms3.204 ms463.71 KiB367310
CPU40968.091 ms8.201 ms8.252 ms8.829 ms632.18 KiB367310
CPU1638430.433 ms30.558 ms30.745 ms31.809 ms1.28 MiB359510
CPU3276859.959 ms60.364 ms60.354 ms60.980 ms2.15 MiB359510
GPU102413.006 ms13.194 ms13.322 ms14.415 ms2.68 MiB7725710
GPU409613.152 ms13.334 ms13.386 ms13.991 ms2.68 MiB7725710
GPU1638413.562 ms13.755 ms13.800 ms14.595 ms2.68 MiB7726010
GPU3276814.646 ms14.879 ms14.948 ms15.450 ms2.68 MiB7726310

2-Dimensional model

This is the benchmark of a 2-Dimensional model setup with (Ns, 1, Ns) grid cells. Here Ns = [32, 64, 128].

PlanktonIndividuals v0.7.5
Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 56 × Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake-avx512)
  GPU: Quadro GV100 (sm_70, 32.000 GiB available)
  CUDA runtime 12.9, artifact installation
  CUDA driver 565.57.1 for 12.7
ArchNNsminmedianmeanmaxmemoryallocssamples
CPU1024325.659 ms5.788 ms5.874 ms6.463 ms1.92 MiB431710
CPU10246413.497 ms13.608 ms13.770 ms15.069 ms5.80 MiB487810
CPU102412845.442 ms54.734 ms53.155 ms61.680 ms20.65 MiB615810
CPU40963211.307 ms11.427 ms11.519 ms12.184 ms2.08 MiB423810
CPU40966419.120 ms19.479 ms19.667 ms20.766 ms5.96 MiB487810
CPU409612851.716 ms56.439 ms57.364 ms65.370 ms20.82 MiB615810
CPU163843233.569 ms33.907 ms34.243 ms35.965 ms2.74 MiB423910
CPU163846441.597 ms42.110 ms42.844 ms45.371 ms6.62 MiB487910
CPU1638412875.032 ms87.652 ms83.260 ms89.017 ms21.47 MiB615910
CPU327683263.176 ms63.657 ms63.717 ms64.465 ms3.62 MiB423910
CPU327686471.786 ms72.317 ms73.374 ms76.535 ms7.50 MiB487910
CPU32768128106.093 ms116.530 ms113.994 ms120.500 ms22.35 MiB615910
GPU10243212.915 ms13.093 ms13.219 ms13.851 ms2.87 MiB8356410
GPU10246413.688 ms14.272 ms14.355 ms15.455 ms3.15 MiB9342310
GPU102412815.540 ms16.113 ms16.036 ms16.361 ms3.92 MiB11789410
GPU40963212.888 ms13.249 ms13.541 ms15.147 ms2.87 MiB8356410
GPU40966413.768 ms13.912 ms14.277 ms15.145 ms3.15 MiB9342310
GPU409612815.740 ms16.697 ms16.592 ms17.422 ms3.92 MiB11789410
GPU163843213.514 ms13.832 ms14.139 ms16.544 ms2.87 MiB8356510
GPU163846413.956 ms14.564 ms14.831 ms17.842 ms3.15 MiB9342510
GPU1638412815.721 ms15.844 ms15.941 ms16.880 ms3.92 MiB11789610
GPU327683213.689 ms13.823 ms13.926 ms15.010 ms2.87 MiB8356810
GPU327686414.460 ms15.067 ms15.092 ms15.700 ms3.15 MiB9342810
GPU3276812816.284 ms17.469 ms17.356 ms18.070 ms3.92 MiB11789910

3-Dimensional model

This is the benchmark of a 3-Dimensional model setup with (Ns, Ns, Ns) grid cells. Here Ns = [32, 64, 128].

PlanktonIndividuals v0.7.5
Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 56 × Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake-avx512)
  GPU: Quadro GV100 (sm_70, 32 GiB available)
  CUDA runtime 12.9, artifact installation
  CUDA driver 565.57.1 for 12.7
ArchNNsminmedianmeanmaxmemoryallocssamples
CPU10243251.587 ms52.013 ms52.390 ms54.706 ms1.06 MiB430510
CPU102464407.569 ms414.229 ms413.787 ms419.221 ms5.55 MiB801710
CPU10241283.293 s3.301 s3.301 s3.310 s40.95 MiB215852
CPU40963256.841 ms58.601 ms58.606 ms60.820 ms1.22 MiB430510
CPU409664422.655 ms425.630 ms425.657 ms428.760 ms5.72 MiB801710
CPU40961283.318 s3.337 s3.337 s3.357 s41.11 MiB215852
CPU163843281.034 ms82.007 ms82.366 ms84.177 ms1.88 MiB430610
CPU1638464454.705 ms457.783 ms457.526 ms459.801 ms6.37 MiB801810
CPU163841283.336 s3.374 s3.374 s3.413 s41.77 MiB215862
CPU3276832112.225 ms113.137 ms113.290 ms114.879 ms2.76 MiB430610
CPU3276864495.213 ms497.812 ms498.189 ms501.221 ms7.25 MiB801810
CPU327681283.472 s3.482 s3.482 s3.493 s42.65 MiB215862
GPU10243212.288 ms12.387 ms12.577 ms13.428 ms4.08 MiB11516510
GPU10246420.906 ms21.227 ms21.697 ms24.127 ms13.06 MiB35321310
GPU102412890.657 ms110.816 ms113.920 ms168.241 ms83.72 MiB221180210
GPU40963212.225 ms12.322 ms12.434 ms13.159 ms4.08 MiB11516510
GPU40966420.767 ms21.059 ms21.390 ms24.283 ms13.06 MiB35321310
GPU409612890.616 ms110.673 ms114.507 ms170.346 ms83.72 MiB221180210
GPU163843212.400 ms12.531 ms12.670 ms13.325 ms4.08 MiB11516710
GPU163846421.132 ms21.983 ms22.558 ms25.184 ms13.06 MiB35321510
GPU1638412890.849 ms110.157 ms114.308 ms169.462 ms83.72 MiB221180410
GPU327683213.077 ms14.088 ms13.939 ms14.860 ms4.08 MiB11517010
GPU327686421.697 ms22.645 ms22.986 ms25.235 ms13.06 MiB35321710
GPU3276812892.342 ms110.662 ms112.529 ms169.928 ms83.72 MiB221180710

PlanktonIndividuals.jl now can also run on Apple M-series GPU. Below is a similar benchmark on Apple CPU and GPU.

PlanktonIndividuals v0.7.5
Julia Version 1.11.7

macOS 26.0.0, Darwin 25.0.0

Toolchain:
- Julia: 1.11.7
- LLVM: 16.0.6

Julia packages: 
- Metal.jl: 1.8.0
- GPUArrays: 11.2.3
- GPUCompiler: 1.6.1
- KernelAbstractions: 0.9.38
- ObjectiveC: 3.4.2
- LLVM: 9.4.2
- LLVMDowngrader_jll: 0.6.0+1

1 device:
- Apple M4 Pro 20 GPU cores (64 GiB Unified Memory)
ArchNminmedianmeanmaxmemoryallocssamples
CPU1024718.333 μs771.458 μs798.633 μs1.006 ms467.94 KiB367910
CPU40962.447 ms2.518 ms2.565 ms2.944 ms636.72 KiB367910
CPU163849.089 ms9.216 ms9.226 ms9.497 ms1.28 MiB368010
CPU6553637.136 ms37.409 ms37.412 ms37.770 ms3.92 MiB360110
GPU102470.685 ms72.641 ms73.513 ms79.681 ms3.47 MiB11534410
GPU409677.897 ms81.464 ms81.831 ms89.131 ms3.48 MiB11553210
GPU1638480.004 ms81.610 ms81.961 ms88.787 ms3.47 MiB11550810
GPU6553681.589 ms81.993 ms82.922 ms91.138 ms3.48 MiB11561110
ArchNNsminmedianmeanmaxmemoryallocssamples
CPU1024321.650 ms1.727 ms1.746 ms2.009 ms2.20 MiB432310
CPU1024644.302 ms4.335 ms4.428 ms4.902 ms7.53 MiB496310
CPU102412814.636 ms14.975 ms15.051 ms15.846 ms21.81 MiB616410
CPU102425658.705 ms59.051 ms59.212 ms61.558 ms81.09 MiB872510
CPU4096323.334 ms3.425 ms3.462 ms3.846 ms2.36 MiB432310
CPU4096646.199 ms6.277 ms6.349 ms6.809 ms7.70 MiB496310
CPU409612816.575 ms16.851 ms16.908 ms17.764 ms21.98 MiB616410
CPU409625660.498 ms61.014 ms61.461 ms65.441 ms81.26 MiB872510
CPU16384329.935 ms10.054 ms10.088 ms10.442 ms3.02 MiB424510
CPU163846415.174 ms15.359 ms15.371 ms15.721 ms8.36 MiB488510
CPU1638412826.593 ms26.860 ms26.911 ms27.408 ms22.64 MiB616510
CPU1638425671.047 ms71.621 ms71.659 ms72.294 ms81.92 MiB872610
CPU655363238.325 ms38.712 ms38.760 ms39.282 ms5.66 MiB424510
CPU655366449.727 ms50.438 ms50.447 ms50.888 ms10.99 MiB488510
CPU6553612864.143 ms64.990 ms64.915 ms65.964 ms25.28 MiB616510
CPU65536256112.613 ms113.154 ms113.323 ms114.930 ms84.55 MiB872610
GPU10243252.458 ms53.845 ms54.525 ms61.694 ms3.58 MiB11847710
GPU10246454.444 ms55.915 ms57.301 ms72.506 ms3.93 MiB13039810
GPU102412858.973 ms60.532 ms62.796 ms86.138 ms4.85 MiB15933610
GPU102425675.546 ms78.564 ms83.052 ms111.422 ms7.59 MiB24111510
GPU40963251.781 ms53.934 ms55.887 ms76.180 ms3.58 MiB11859210
GPU40966455.067 ms56.103 ms56.670 ms63.328 ms3.93 MiB13056510
GPU409612854.806 ms60.063 ms60.309 ms72.531 ms4.85 MiB15953410
GPU409625678.388 ms79.156 ms82.094 ms101.848 ms7.59 MiB24128110
GPU163843250.982 ms53.887 ms54.549 ms61.776 ms3.58 MiB11860810
GPU163846455.210 ms56.055 ms58.040 ms77.365 ms3.93 MiB13057510
GPU1638412859.182 ms60.448 ms62.112 ms78.982 ms4.85 MiB15954210
GPU1638425676.740 ms78.906 ms81.357 ms105.150 ms7.59 MiB24129210
GPU655363252.495 ms53.323 ms54.469 ms63.653 ms3.58 MiB11871510
GPU655366454.705 ms56.284 ms57.580 ms70.857 ms3.93 MiB13066010
GPU6553612854.231 ms56.561 ms57.933 ms70.151 ms4.85 MiB15966310
GPU6553625676.922 ms79.406 ms82.436 ms110.916 ms7.60 MiB24137610
ArchNNsminmedianmeanmaxmemoryallocssamples
CPU10243213.701 ms14.026 ms14.027 ms14.352 ms1.07 MiB431110
CPU102464104.133 ms104.601 ms104.670 ms105.534 ms5.55 MiB802310
CPU1024128828.092 ms829.160 ms828.977 ms829.878 ms43.23 MiB215917
CPU10242566.707 s6.707 s6.707 s6.707 s322.40 MiB733031
CPU40963215.903 ms16.124 ms16.141 ms16.600 ms1.24 MiB431110
CPU409664106.679 ms107.173 ms107.316 ms108.028 ms5.72 MiB802310
CPU4096128820.950 ms821.295 ms821.341 ms821.975 ms41.12 MiB215917
CPU40962566.801 s6.801 s6.801 s6.801 s322.56 MiB733031
CPU163843225.392 ms25.647 ms25.625 ms25.994 ms1.90 MiB431210
CPU1638464117.064 ms117.777 ms117.660 ms118.296 ms6.38 MiB802410
CPU16384128833.843 ms835.038 ms834.786 ms835.743 ms41.77 MiB215926
CPU163842566.712 s6.712 s6.712 s6.712 s323.22 MiB733041
CPU655363265.352 ms66.282 ms66.218 ms66.924 ms4.54 MiB431210
CPU6553664165.360 ms166.149 ms166.157 ms167.143 ms9.02 MiB802410
CPU65536128888.348 ms889.139 ms889.124 ms890.255 ms44.41 MiB215926
CPU655362566.776 s6.776 s6.776 s6.776 s325.86 MiB733041
GPU10243221.830 ms22.871 ms23.228 ms28.077 ms4.56 MiB14041010
GPU10246429.849 ms30.973 ms32.603 ms48.207 ms13.65 MiB38361110
GPU1024128119.130 ms124.133 ms126.141 ms150.394 ms87.53 MiB224736510
GPU1024256932.425 ms962.754 ms966.198 ms1.023 s658.84 MiB170093586
GPU40963221.630 ms22.592 ms22.975 ms26.425 ms4.57 MiB14058710
GPU40966429.367 ms29.591 ms32.406 ms54.415 ms13.66 MiB38380710
GPU4096128118.579 ms125.494 ms127.510 ms154.465 ms86.75 MiB224753410
GPU4096256915.231 ms950.850 ms949.480 ms969.102 ms658.84 MiB170095326
GPU163843221.454 ms22.460 ms22.777 ms26.804 ms4.57 MiB14060110
GPU163846427.957 ms29.587 ms31.845 ms53.485 ms13.65 MiB38372910
GPU16384128119.955 ms124.230 ms127.157 ms160.412 ms85.97 MiB224753810
GPU16384256904.653 ms963.670 ms948.708 ms966.893 ms658.84 MiB170095266
GPU655363220.600 ms23.022 ms23.021 ms26.879 ms4.57 MiB14070810
GPU655366426.643 ms29.079 ms30.251 ms44.419 ms13.66 MiB38381110
GPU65536128120.069 ms121.475 ms126.132 ms163.317 ms87.53 MiB224762210
GPU65536256894.671 ms943.257 ms938.207 ms962.530 ms658.84 MiB170096126