updating the presentation

c465b5fa · orestis.malaspin · ddd64566 · c465b5fa
Verified Commit c465b5fa authored 4 years ago by orestis.malaspin
--- a/presentations/pasc/pres.md
+++ b/presentations/pasc/pres.md
@@ -6,18 +6,22 @@
 ## Palabos: a massively parallel high performance fluid flow solver
+. . .
+## **Do not hesitate to interrupt me at any time**
 # What is Futhark
- Statically typed, data-parallel, purely functional, array language ...
+- Statically typed, data-parallel, purely functional, array language,
- with limited functionalities (no I/O for example) ...
+- with limited functionalities (no I/O for example),
- that compiles to sequencial, multi-core, OpenCL, and Cuda backends...
+- that compiles to sequencial, multi-core, OpenCL, and Cuda backends,
- very efficiently ...
+- very efficiently,
 - without the pain of actually writing sequential, multi-code, or GPU code.
 . . .
- Developed in Copenhagen by Troels Henriksen ...
+- Developed in Copenhagen by T. Henriksen,
- Very friendly and eager to help newcomers ...
+- Very friendly and eager to help newcomers,
 - Still a very experimental project.
 # Why use Futhark?
@@ -73,6 +77,10 @@ int main()
    * Difficult to add features.
    * Difficult to optimize.
+. . .
+**Futhark handles that for you!**
 # How to use Futhark?
 - Not intended to replace existing generic-purpose languages.
@@ -82,19 +90,37 @@ int main()
    - Conventional `C` code,
    - Several others (`C#`, `Haskell`, `F#`, and `Rust` for example).
- Futhark produces `C` code so it's accessible from any language through a FFI.
+- Futhark produces `C` code: FFI for most languages.
-# An example: the dot product
+# Basic syntax
-::: dotprod
+## Differences with classical HPC languages
-## `dotprod.fut`
+* Functional language $\Rightarrow$ functions always return a something (unlike 
+  C/C++ for example).
+* Function arguments cannot be modified in place.
+## `let` ... `in`
+```ocaml
+-- The addition of 3 doubles:
+-- Signature f64 -> f64 -> f64 -> f64
+let add (a: f64)  (b: f64) (c: f64) : f64 =
+    let d = a + b
+    in c + d
 ```
+This should be read: **let** `d` be equal to `a + b`, **in** `c + d`.
+# An example: the dot product
+## `dotprod.fut`
+```ocaml
 entry dotprod (xs: []i32) (ys: []i32): i32 =
  reduce (+) 0 (map (\(x, y) -> x * y) (zip xs ys))
 ```
-:::
 Intrinsics: SOAC (Second order array combinators)
@@ -149,14 +175,19 @@ int main() {
 # The lattice Boltzmann method
-* Cellular automaton-like algorithm for fluid flow simulation.
+## The algorithm
-::: Simulation
+* Cellular automaton-like algorithm for fluid flow simulation.
+* Cartesian grid with $q$ variables per grid point, $f_i(\bm{x}, t)$.
+* Each time step is made of:
+    1. Collision (local operations only).
+    2. Propagation (non-local operations).
+* Notoriously straightforward to parallelize.
 ## Simulation
 0. Initialization (no Futhark here).
-1. Collision.
+1. Collision (on every grid point at the same time).
    - Compute $\rho(f_i)$,
    - Compute $\bm{j}(f_i)$,
    - Compute $f_i^\mathrm{eq}(\rho,\bm{j})$,
@@ -166,81 +197,94 @@ int main() {
 Repeat 1-2 a certain amount of times
 3. Get the data and process it.
-:::
-# Computation of macroscopic moments (1/2)
+# The LBM pseudo-code
+```ocaml
+let time_step [nx][ny][nz][q] (f: [nx][ny][nz][q]f32)
+    -> [nx][ny][nz][q]f32 =
+    let rho  = compute_rho f
+    let j    = compute_j f
+    let feq  = compute_feq rho u
+    let fout = collide f feq omega
+    in stream fout
+```
+# Data structures and intrinsics
+## Only arrays
-::: Simulation
+```ocaml
+f: [nx][ny][nz][q]f32   -- 4d array
+feq: [nx][ny][nz][q]f32 -- 4d array
+rho: [nx][ny][nz]f32    -- 3d array
+j: [nx][ny][nz][d]f32      -- 4d array
+```
+## Functions
+```ocaml
+let map3d 'a [nx][ny][nz]'b
+    (foo: a -> b) (xs: [nx][ny][nz]a) -> [nx][ny][nz]b =
+    map (\xs1 ->
+        map (\xs2 ->
+            map (\xs3 -> foo xs3) xs2
+        ) xs1
+    ) xs
+```
+# Computation of macroscopic moments (1/2)
 ## LBM equations
+On **each** grid point:
 \begin{equation}
 \rho=\sum_{i=0}^{q-1}f_i, \forall\ \bm{x}.
 \end{equation}
-:::
-::: Futhark
 ## Futhark code
 ```ocaml
-map (\fx ->
+let compute_rho [nx][ny][nz][q] 
-    map (\fxy ->
+    (f: [nx][ny][nz][q]f32) -> [nx][ny][nz]f32 =
-        map (\fxyz ->
+map3d (\fxyz ->
    reduce (+) 0 fxyz
-        ) fxy
-    ) fx
 ) f
 ```
-:::
 # Computation of macroscopic moments (2/2)
-::: Simulation
 ## LBM equation
 \begin{equation}
 \rho\bm{u}=\sum_{i=0}^{q-1}f_i \bm{c}_i, \forall\ \bm{x}.
 \end{equation}
-:::
-::: Futhark
 ## Futhark code
 ```ocaml
-map (\fx ->
+let compute_j [nx][ny][nz][q] 
-    map (\fxy ->
+    (f: [nx][ny][nz][q]f32) -> [nx][ny][nz][3]f32 =
-        map (\fxyz ->
+map3d (\fxyz ->
    map(\ci ->
        dotprod ci fxyz
    ) (transpose c) -- intrinsic
-        ) fxy
-    ) fx
 ) f
 ```
-:::
 # Computation of the equilibrium distribution
-::: Simulation
 ## LBM equation
 \begin{equation}
 f_i^\mathrm{eq}=w_i\rho\left(1+\frac{\bm{c}_i\cdot \bm{u}}{c_s^2}+\frac{1}{2c_s^4}(\bm{c}_i\cdot \bm{u})^2-\frac{1}{2c_s^2}\bm{u}^2\right),\ \forall \bm{x},i
 \end{equation}
-:::
-::: Futhark
 ## Futhark code
 ```ocaml
-map2(\rho_x j_x -> 
+map_3d(\rho_xyz j_xyz
-  map2(\rho_xy j_xy ->
-    map2(\rho_xyz j_xyz ->
    let u = map(\j_xyzi -> j_xyzi / rho_xyz ) j_xyz
    let u_sqr = dotprod u u
@@ -249,56 +293,35 @@ map2(\rho_x j_x ->
        in rho_xyz * wi *
            (1 + 3 * c_u + 4.5 * c_u * c_u - 1.5 * u_sqr)
    ) w c
-    ) rho_xy j_xy 
+) (zip rho j)
-  ) rho_x j_x 
-) rho j
 ```
-:::
 # Collision
-::: Simulation
 ## LBM equation
 \begin{equation}
 f^\mathrm{out}_i=f_i\left(1-\omega\right)+\omega f_i^\mathrm{eq}.
 \end{equation}
-:::
-::: Futhark
 ## Futhark code
 ```ocaml
-map2(\f_x feq_x ->
+map2_3d(\f_xyz feq_xyz ->
-  map2(\f_xy feq_xy ->
-      map2(\f_xyz feq_xyz ->
  map2(\f_i feq_i->
    f_i * (1.0 - omega) + feq_i * omega
  ) f_xyz feq_xyz
-      ) f_xy feq_xy
+) (zip f feq)
-  ) f_x feq_x
-) f feq
 ```
-:::
 # Streaming
-::: Simulation
 ## LBM equation
 \begin{equation}
 f_i(\bm{x}+\bm{c}_i,t+1)=f^\mathrm{out}_i(\bm{x},t).
 \end{equation}
-:::
-::: Futhark
 ## Futhark code
 ```ocaml
@@ -311,26 +334,25 @@ tabulate_4d nx ny nz q (\x y z ipop ->
    in f[next_x, next_y, next_z, ipop]
 )
 ```
-:::
 # Summary
 * A simple yet complete fluid flow simulator.
-* Lines of readable and easy to debug Futhark code: 110.
+* Lines of readable and "easy" to debug Futhark code: 110.
-* Single precision, periodic, D3Q27 only arrays: 250 MLPUS.
+* Single precision, periodic, only arrays: 250 MLPUS.
 *Not bad: but we can do better.*
 # How can we go faster?
 * Arrays are aggressively parallelized: each dimension is flattened.
-* For small dimensions it is usually not worth.
+* For small dimensions it is usually not worth it.
 * Replace length 3, or length 27 arrays by tuples: better use of GPU 
  architecture or use `INCREMENTAL_FLATTENING`.
 * `[](a, b, c, ..) -> ([]a, []b, []c, ...)`{.ocaml} automatically by the compiler.
 * Result: with a code of 150 lines, we go to 1.5 GLUPS on GPU, 11 MLPUS on a 
  single core, 400 MLUPS on a multi-core machine.
-* All results are the same with CUDA and OpenCL backends.
+* Within 10-20\% of state of the art optimized GPU codes. 
 # Conclusion
@@ -345,7 +367,7 @@ tabulate_4d nx ny nz q (\x y z ipop ->
 # Current and Future Futhark planned developments
-## From Troels Henriksen himself
+## Currently worked on by the core team
 * Multi-GPU: only on a single motherboard (very experimental).
@@ -358,9 +380,10 @@ tabulate_4d nx ny nz q (\x y z ipop ->
    * Distributed CPU/GPU (MPI) backend.
    * A cool rendering tool directly from the GPU (OpenGL).
 # Acknowledgments
+## By alphabetical order
 * V. Berset,
 * B. Coudray.
 * M. El Kharroubi,
@@ -368,5 +391,8 @@ tabulate_4d nx ny nz q (\x y z ipop ->
 # Questions?
+## Futhark webpage: <https://futhark-lang.org/>
 ## Thank you for your attention