Benchmarks

Here we benchmark the model performance in two Architectures. The number of individuals used in the benchmark are (2^10, 2^15, 2^17, 2^20). And we also use different grid resolutions in 2-Dimensional and 3-Dimensional model setup.

0-Dimensional model

This is a benchmark of a simple 0-Dimensional model setup without advection of Eulerian tracers. However, the advection of individuals still take the same amount of time whether the velocity field is provided or not.

PlanktonIndividuals v0.6.1
Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
  GPU: Tesla P100-PCIE-12GB
  CUDA runtime 11.8, artifact installation
  CUDA driver 11.2
  NVIDIA driver 460.84.0
ArchNminmedianmeanmaxmemoryallocs
CPU10242.945 ms3.016 ms3.167 ms4.328 ms478.67 KiB2992
CPU3276869.741 ms69.812 ms71.594 ms80.231 ms477.72 KiB2931
CPU131072276.553 ms276.966 ms280.569 ms300.907 ms477.72 KiB2931
CPU10485762.582 s2.590 s2.590 s2.598 s477.72 KiB2931
GPU10247.085 ms7.158 ms7.364 ms9.323 ms1.92 MiB21327
GPU327687.435 ms7.520 ms7.925 ms10.173 ms1.92 MiB21327
GPU1310727.053 ms9.161 ms9.851 ms19.812 ms1.92 MiB21294
GPU10485768.005 ms46.217 ms47.484 ms122.516 ms1.92 MiB21294

2-Dimensional model

This is the benchmark of a 2-Dimensional model setup with (Ns, 1, Ns) grid cells. Here Ns = [32, 64, 128].

PlanktonIndividuals v0.6.1
Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
  GPU: Tesla P100-PCIE-12GB
  CUDA runtime 11.8, artifact installation
  CUDA driver 11.2
  NVIDIA driver 460.84.0
ArchNNsminmedianmeanmaxmemoryallocs
CPU1024328.096 ms8.132 ms8.211 ms8.688 ms2.70 MiB3109
CPU10246419.889 ms19.940 ms20.064 ms20.952 ms8.68 MiB3052
CPU102412868.735 ms69.030 ms69.672 ms75.046 ms31.72 MiB3052
CPU327683274.115 ms74.154 ms76.313 ms85.288 ms2.70 MiB3048
CPU327686489.999 ms90.163 ms92.340 ms101.475 ms8.68 MiB3052
CPU32768128162.286 ms162.618 ms168.129 ms190.011 ms31.72 MiB3052
CPU13107232282.810 ms282.913 ms286.631 ms307.620 ms2.70 MiB3048
CPU13107264328.584 ms328.962 ms332.448 ms357.787 ms8.68 MiB3052
CPU131072128447.271 ms453.263 ms470.108 ms509.040 ms31.72 MiB3052
CPU1048576322.476 s2.476 s2.501 s2.552 s2.70 MiB3048
CPU1048576642.910 s2.911 s2.911 s2.911 s8.68 MiB3052
CPU10485761282.905 s2.909 s2.909 s2.914 s31.72 MiB3052
GPU1024326.902 ms6.920 ms7.101 ms8.719 ms1.98 MiB21513
GPU1024647.417 ms7.622 ms7.755 ms8.430 ms2.07 MiB21632
GPU10241287.734 ms8.071 ms8.141 ms8.854 ms2.45 MiB21713
GPU32768327.011 ms7.092 ms7.392 ms10.142 ms1.98 MiB21513
GPU32768646.769 ms6.837 ms7.152 ms10.035 ms2.07 MiB21632
GPU327681287.027 ms8.381 ms8.561 ms11.845 ms2.45 MiB21713
GPU131072326.580 ms8.054 ms8.560 ms15.323 ms1.98 MiB21541
GPU131072647.491 ms9.106 ms9.664 ms16.128 ms2.07 MiB21599
GPU1310721287.918 ms12.640 ms12.791 ms23.534 ms2.45 MiB21680
GPU1048576329.781 ms35.539 ms36.437 ms59.171 ms1.98 MiB21528
GPU10485766410.682 ms37.958 ms39.055 ms65.476 ms2.08 MiB21647
GPU10485761287.994 ms50.094 ms50.772 ms126.537 ms2.45 MiB21680

3-Dimensional model

This is the benchmark of a 3-Dimensional model setup with (Ns, Ns, Ns) grid cells. Here Ns = [32, 64].

PlanktonIndividuals v0.6.1
Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
  GPU: Tesla P100-PCIE-12GB
  CUDA runtime 11.8, artifact installation
  CUDA driver 11.2
  NVIDIA driver 460.84.0
ArchNNsminmedianmeanmaxmemoryallocs
CPU10243250.081 ms50.249 ms50.421 ms51.994 ms1.38 MiB2820
CPU102464410.840 ms459.105 ms451.043 ms459.516 ms8.43 MiB2821
CPU3276832124.176 ms124.312 ms126.438 ms138.224 ms1.38 MiB2820
CPU3276864498.713 ms534.237 ms534.148 ms554.501 ms8.43 MiB2821
CPU13107232351.282 ms351.674 ms355.733 ms387.071 ms1.38 MiB2820
CPU13107264790.994 ms808.337 ms816.691 ms848.149 ms8.43 MiB2821
CPU1048576323.019 s3.072 s3.072 s3.125 s1.38 MiB2820
CPU1048576643.258 s3.258 s3.258 s3.258 s8.43 MiB2821
GPU1024326.229 ms6.286 ms6.466 ms7.329 ms2.94 MiB21053
GPU1024649.194 ms11.891 ms11.689 ms12.604 ms9.99 MiB21077
GPU32768326.570 ms6.638 ms6.966 ms8.974 ms2.94 MiB21053
GPU32768649.143 ms12.882 ms12.712 ms15.781 ms9.99 MiB21077
GPU131072326.481 ms9.150 ms9.469 ms16.907 ms2.94 MiB21081
GPU131072649.212 ms16.623 ms16.438 ms25.557 ms9.99 MiB21105
GPU1048576327.257 ms39.894 ms40.268 ms96.189 ms2.94 MiB21020
GPU1048576649.586 ms54.934 ms53.741 ms118.675 ms9.99 MiB21105