Skip to content
Snippets Groups Projects
Verified Commit c465b5fa authored by orestis.malaspin's avatar orestis.malaspin
Browse files

updating the presentation

parent ddd64566
No related branches found
No related tags found
No related merge requests found
...@@ -6,18 +6,22 @@ ...@@ -6,18 +6,22 @@
## Palabos: a massively parallel high performance fluid flow solver ## Palabos: a massively parallel high performance fluid flow solver
. . .
## **Do not hesitate to interrupt me at any time**
# What is Futhark # What is Futhark
- Statically typed, data-parallel, purely functional, array language ... - Statically typed, data-parallel, purely functional, array language,
- with limited functionalities (no I/O for example) ... - with limited functionalities (no I/O for example),
- that compiles to sequencial, multi-core, OpenCL, and Cuda backends... - that compiles to sequencial, multi-core, OpenCL, and Cuda backends,
- very efficiently ... - very efficiently,
- without the pain of actually writing sequential, multi-code, or GPU code. - without the pain of actually writing sequential, multi-code, or GPU code.
. . . . . .
- Developed in Copenhagen by Troels Henriksen ... - Developed in Copenhagen by T. Henriksen,
- Very friendly and eager to help newcomers ... - Very friendly and eager to help newcomers,
- Still a very experimental project. - Still a very experimental project.
# Why use Futhark? # Why use Futhark?
...@@ -73,6 +77,10 @@ int main() ...@@ -73,6 +77,10 @@ int main()
* Difficult to add features. * Difficult to add features.
* Difficult to optimize. * Difficult to optimize.
. . .
**Futhark handles that for you!**
# How to use Futhark? # How to use Futhark?
- Not intended to replace existing generic-purpose languages. - Not intended to replace existing generic-purpose languages.
...@@ -82,19 +90,37 @@ int main() ...@@ -82,19 +90,37 @@ int main()
- Conventional `C` code, - Conventional `C` code,
- Several others (`C#`, `Haskell`, `F#`, and `Rust` for example). - Several others (`C#`, `Haskell`, `F#`, and `Rust` for example).
- Futhark produces `C` code so it's accessible from any language through a FFI. - Futhark produces `C` code: FFI for most languages.
# An example: the dot product # Basic syntax
::: dotprod ## Differences with classical HPC languages
## `dotprod.fut` * Functional language $\Rightarrow$ functions always return a something (unlike
C/C++ for example).
* Function arguments cannot be modified in place.
## `let` ... `in`
```ocaml
-- The addition of 3 doubles:
-- Signature f64 -> f64 -> f64 -> f64
let add (a: f64) (b: f64) (c: f64) : f64 =
let d = a + b
in c + d
``` ```
This should be read: **let** `d` be equal to `a + b`, **in** `c + d`.
# An example: the dot product
## `dotprod.fut`
```ocaml
entry dotprod (xs: []i32) (ys: []i32): i32 = entry dotprod (xs: []i32) (ys: []i32): i32 =
reduce (+) 0 (map (\(x, y) -> x * y) (zip xs ys)) reduce (+) 0 (map (\(x, y) -> x * y) (zip xs ys))
``` ```
:::
Intrinsics: SOAC (Second order array combinators) Intrinsics: SOAC (Second order array combinators)
...@@ -149,14 +175,19 @@ int main() { ...@@ -149,14 +175,19 @@ int main() {
# The lattice Boltzmann method # The lattice Boltzmann method
* Cellular automaton-like algorithm for fluid flow simulation. ## The algorithm
::: Simulation * Cellular automaton-like algorithm for fluid flow simulation.
* Cartesian grid with $q$ variables per grid point, $f_i(\bm{x}, t)$.
* Each time step is made of:
1. Collision (local operations only).
2. Propagation (non-local operations).
* Notoriously straightforward to parallelize.
## Simulation ## Simulation
0. Initialization (no Futhark here). 0. Initialization (no Futhark here).
1. Collision. 1. Collision (on every grid point at the same time).
- Compute $\rho(f_i)$, - Compute $\rho(f_i)$,
- Compute $\bm{j}(f_i)$, - Compute $\bm{j}(f_i)$,
- Compute $f_i^\mathrm{eq}(\rho,\bm{j})$, - Compute $f_i^\mathrm{eq}(\rho,\bm{j})$,
...@@ -166,81 +197,94 @@ int main() { ...@@ -166,81 +197,94 @@ int main() {
Repeat 1-2 a certain amount of times Repeat 1-2 a certain amount of times
3. Get the data and process it. 3. Get the data and process it.
:::
# Computation of macroscopic moments (1/2) # The LBM pseudo-code
```ocaml
let time_step [nx][ny][nz][q] (f: [nx][ny][nz][q]f32)
-> [nx][ny][nz][q]f32 =
let rho = compute_rho f
let j = compute_j f
let feq = compute_feq rho u
let fout = collide f feq omega
in stream fout
```
# Data structures and intrinsics
## Only arrays
::: Simulation ```ocaml
f: [nx][ny][nz][q]f32 -- 4d array
feq: [nx][ny][nz][q]f32 -- 4d array
rho: [nx][ny][nz]f32 -- 3d array
j: [nx][ny][nz][d]f32 -- 4d array
```
## Functions
```ocaml
let map3d 'a [nx][ny][nz]'b
(foo: a -> b) (xs: [nx][ny][nz]a) -> [nx][ny][nz]b =
map (\xs1 ->
map (\xs2 ->
map (\xs3 -> foo xs3) xs2
) xs1
) xs
```
# Computation of macroscopic moments (1/2)
## LBM equations ## LBM equations
On **each** grid point:
\begin{equation} \begin{equation}
\rho=\sum_{i=0}^{q-1}f_i, \forall\ \bm{x}. \rho=\sum_{i=0}^{q-1}f_i, \forall\ \bm{x}.
\end{equation} \end{equation}
:::
::: Futhark
## Futhark code ## Futhark code
```ocaml ```ocaml
map (\fx -> let compute_rho [nx][ny][nz][q]
map (\fxy -> (f: [nx][ny][nz][q]f32) -> [nx][ny][nz]f32 =
map (\fxyz -> map3d (\fxyz ->
reduce (+) 0 fxyz reduce (+) 0 fxyz
) fxy
) fx
) f ) f
``` ```
:::
# Computation of macroscopic moments (2/2) # Computation of macroscopic moments (2/2)
::: Simulation
## LBM equation ## LBM equation
\begin{equation} \begin{equation}
\rho\bm{u}=\sum_{i=0}^{q-1}f_i \bm{c}_i, \forall\ \bm{x}. \rho\bm{u}=\sum_{i=0}^{q-1}f_i \bm{c}_i, \forall\ \bm{x}.
\end{equation} \end{equation}
:::
::: Futhark
## Futhark code ## Futhark code
```ocaml ```ocaml
map (\fx -> let compute_j [nx][ny][nz][q]
map (\fxy -> (f: [nx][ny][nz][q]f32) -> [nx][ny][nz][3]f32 =
map (\fxyz -> map3d (\fxyz ->
map(\ci -> map(\ci ->
dotprod ci fxyz dotprod ci fxyz
) (transpose c) -- intrinsic ) (transpose c) -- intrinsic
) fxy
) fx
) f ) f
``` ```
:::
# Computation of the equilibrium distribution # Computation of the equilibrium distribution
::: Simulation
## LBM equation ## LBM equation
\begin{equation} \begin{equation}
f_i^\mathrm{eq}=w_i\rho\left(1+\frac{\bm{c}_i\cdot \bm{u}}{c_s^2}+\frac{1}{2c_s^4}(\bm{c}_i\cdot \bm{u})^2-\frac{1}{2c_s^2}\bm{u}^2\right),\ \forall \bm{x},i f_i^\mathrm{eq}=w_i\rho\left(1+\frac{\bm{c}_i\cdot \bm{u}}{c_s^2}+\frac{1}{2c_s^4}(\bm{c}_i\cdot \bm{u})^2-\frac{1}{2c_s^2}\bm{u}^2\right),\ \forall \bm{x},i
\end{equation} \end{equation}
:::
::: Futhark
## Futhark code ## Futhark code
```ocaml ```ocaml
map2(\rho_x j_x -> map_3d(\rho_xyz j_xyz
map2(\rho_xy j_xy ->
map2(\rho_xyz j_xyz ->
let u = map(\j_xyzi -> j_xyzi / rho_xyz ) j_xyz let u = map(\j_xyzi -> j_xyzi / rho_xyz ) j_xyz
let u_sqr = dotprod u u let u_sqr = dotprod u u
...@@ -249,56 +293,35 @@ map2(\rho_x j_x -> ...@@ -249,56 +293,35 @@ map2(\rho_x j_x ->
in rho_xyz * wi * in rho_xyz * wi *
(1 + 3 * c_u + 4.5 * c_u * c_u - 1.5 * u_sqr) (1 + 3 * c_u + 4.5 * c_u * c_u - 1.5 * u_sqr)
) w c ) w c
) rho_xy j_xy ) (zip rho j)
) rho_x j_x
) rho j
``` ```
:::
# Collision # Collision
::: Simulation
## LBM equation ## LBM equation
\begin{equation} \begin{equation}
f^\mathrm{out}_i=f_i\left(1-\omega\right)+\omega f_i^\mathrm{eq}. f^\mathrm{out}_i=f_i\left(1-\omega\right)+\omega f_i^\mathrm{eq}.
\end{equation} \end{equation}
:::
::: Futhark
## Futhark code ## Futhark code
```ocaml ```ocaml
map2(\f_x feq_x -> map2_3d(\f_xyz feq_xyz ->
map2(\f_xy feq_xy ->
map2(\f_xyz feq_xyz ->
map2(\f_i feq_i-> map2(\f_i feq_i->
f_i * (1.0 - omega) + feq_i * omega f_i * (1.0 - omega) + feq_i * omega
) f_xyz feq_xyz ) f_xyz feq_xyz
) f_xy feq_xy ) (zip f feq)
) f_x feq_x
) f feq
``` ```
:::
# Streaming # Streaming
::: Simulation
## LBM equation ## LBM equation
\begin{equation} \begin{equation}
f_i(\bm{x}+\bm{c}_i,t+1)=f^\mathrm{out}_i(\bm{x},t). f_i(\bm{x}+\bm{c}_i,t+1)=f^\mathrm{out}_i(\bm{x},t).
\end{equation} \end{equation}
:::
::: Futhark
## Futhark code ## Futhark code
```ocaml ```ocaml
...@@ -311,26 +334,25 @@ tabulate_4d nx ny nz q (\x y z ipop -> ...@@ -311,26 +334,25 @@ tabulate_4d nx ny nz q (\x y z ipop ->
in f[next_x, next_y, next_z, ipop] in f[next_x, next_y, next_z, ipop]
) )
``` ```
:::
# Summary # Summary
* A simple yet complete fluid flow simulator. * A simple yet complete fluid flow simulator.
* Lines of readable and easy to debug Futhark code: 110. * Lines of readable and "easy" to debug Futhark code: 110.
* Single precision, periodic, D3Q27 only arrays: 250 MLPUS. * Single precision, periodic, only arrays: 250 MLPUS.
*Not bad: but we can do better.* *Not bad: but we can do better.*
# How can we go faster? # How can we go faster?
* Arrays are aggressively parallelized: each dimension is flattened. * Arrays are aggressively parallelized: each dimension is flattened.
* For small dimensions it is usually not worth. * For small dimensions it is usually not worth it.
* Replace length 3, or length 27 arrays by tuples: better use of GPU * Replace length 3, or length 27 arrays by tuples: better use of GPU
architecture or use `INCREMENTAL_FLATTENING`. architecture or use `INCREMENTAL_FLATTENING`.
* `[](a, b, c, ..) -> ([]a, []b, []c, ...)`{.ocaml} automatically by the compiler. * `[](a, b, c, ..) -> ([]a, []b, []c, ...)`{.ocaml} automatically by the compiler.
* Result: with a code of 150 lines, we go to 1.5 GLUPS on GPU, 11 MLPUS on a * Result: with a code of 150 lines, we go to 1.5 GLUPS on GPU, 11 MLPUS on a
single core, 400 MLUPS on a multi-core machine. single core, 400 MLUPS on a multi-core machine.
* All results are the same with CUDA and OpenCL backends. * Within 10-20\% of state of the art optimized GPU codes.
# Conclusion # Conclusion
...@@ -345,7 +367,7 @@ tabulate_4d nx ny nz q (\x y z ipop -> ...@@ -345,7 +367,7 @@ tabulate_4d nx ny nz q (\x y z ipop ->
# Current and Future Futhark planned developments # Current and Future Futhark planned developments
## From Troels Henriksen himself ## Currently worked on by the core team
* Multi-GPU: only on a single motherboard (very experimental). * Multi-GPU: only on a single motherboard (very experimental).
...@@ -358,9 +380,10 @@ tabulate_4d nx ny nz q (\x y z ipop -> ...@@ -358,9 +380,10 @@ tabulate_4d nx ny nz q (\x y z ipop ->
* Distributed CPU/GPU (MPI) backend. * Distributed CPU/GPU (MPI) backend.
* A cool rendering tool directly from the GPU (OpenGL). * A cool rendering tool directly from the GPU (OpenGL).
# Acknowledgments # Acknowledgments
## By alphabetical order
* V. Berset, * V. Berset,
* B. Coudray. * B. Coudray.
* M. El Kharroubi, * M. El Kharroubi,
...@@ -368,5 +391,8 @@ tabulate_4d nx ny nz q (\x y z ipop -> ...@@ -368,5 +391,8 @@ tabulate_4d nx ny nz q (\x y z ipop ->
# Questions? # Questions?
## Futhark webpage: <https://futhark-lang.org/>
## Thank you for your attention ## Thank you for your attention
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment