Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
palathark
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
orestis.malaspin
palathark
Commits
c465b5fa
Verified
Commit
c465b5fa
authored
4 years ago
by
orestis.malaspin
Browse files
Options
Downloads
Patches
Plain Diff
updating the presentation
parent
ddd64566
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
presentations/pasc/pres.md
+115
-89
115 additions, 89 deletions
presentations/pasc/pres.md
with
115 additions
and
89 deletions
presentations/pasc/pres.md
+
115
−
89
View file @
c465b5fa
...
@@ -6,18 +6,22 @@
...
@@ -6,18 +6,22 @@
## Palabos: a massively parallel high performance fluid flow solver
## Palabos: a massively parallel high performance fluid flow solver
. . .
## **Do not hesitate to interrupt me at any time**
# What is Futhark
# What is Futhark
-
Statically typed, data-parallel, purely functional, array language
...
-
Statically typed, data-parallel, purely functional, array language
,
-
with limited functionalities (no I/O for example)
...
-
with limited functionalities (no I/O for example)
,
-
that compiles to sequencial, multi-core, OpenCL, and Cuda backends
...
-
that compiles to sequencial, multi-core, OpenCL, and Cuda backends
,
-
very efficiently
...
-
very efficiently
,
-
without the pain of actually writing sequential, multi-code, or GPU code.
-
without the pain of actually writing sequential, multi-code, or GPU code.
. . .
. . .
-
Developed in Copenhagen by T
roels
Henriksen
...
-
Developed in Copenhagen by T
.
Henriksen
,
-
Very friendly and eager to help newcomers
...
-
Very friendly and eager to help newcomers
,
-
Still a very experimental project.
-
Still a very experimental project.
# Why use Futhark?
# Why use Futhark?
...
@@ -73,6 +77,10 @@ int main()
...
@@ -73,6 +77,10 @@ int main()
*
Difficult to add features.
*
Difficult to add features.
*
Difficult to optimize.
*
Difficult to optimize.
. . .
**Futhark handles that for you!**
# How to use Futhark?
# How to use Futhark?
-
Not intended to replace existing generic-purpose languages.
-
Not intended to replace existing generic-purpose languages.
...
@@ -82,19 +90,37 @@ int main()
...
@@ -82,19 +90,37 @@ int main()
- Conventional `C` code,
- Conventional `C` code,
- Several others (`C#`, `Haskell`, `F#`, and `Rust` for example).
- Several others (`C#`, `Haskell`, `F#`, and `Rust` for example).
-
Futhark produces
`C`
code
so it's accessible from any language through a FFI
.
-
Futhark produces
`C`
code
: FFI for most languages
.
#
An example: the dot product
#
Basic syntax
::: dotprod
## Differences with classical HPC languages
## `dotprod.fut`
*
Functional language $
\R
ightarrow$ functions always return a something (unlike
C/C++ for example).
*
Function arguments cannot be modified in place.
## `let` ... `in`
```
ocaml
--
The
addition
of
3
doubles
:
--
Signature
f64
->
f64
->
f64
->
f64
let
add
(
a
:
f64
)
(
b
:
f64
)
(
c
:
f64
)
:
f64
=
let
d
=
a
+
b
in
c
+
d
```
```
This should be read:
**let**
`d`
be equal to
`a + b`
,
**in**
`c + d`
.
# An example: the dot product
## `dotprod.fut`
```
ocaml
entry
dotprod
(
xs
:
[]
i32
)
(
ys
:
[]
i32
)
:
i32
=
entry
dotprod
(
xs
:
[]
i32
)
(
ys
:
[]
i32
)
:
i32
=
reduce
(
+
)
0
(
map
(
\
(
x
,
y
)
->
x
*
y
)
(
zip
xs
ys
))
reduce
(
+
)
0
(
map
(
\
(
x
,
y
)
->
x
*
y
)
(
zip
xs
ys
))
```
```
:::
Intrinsics: SOAC (Second order array combinators)
Intrinsics: SOAC (Second order array combinators)
...
@@ -149,14 +175,19 @@ int main() {
...
@@ -149,14 +175,19 @@ int main() {
# The lattice Boltzmann method
# The lattice Boltzmann method
*
Cellular automaton-like algorithm for fluid flow simulation.
## The algorithm
::: Simulation
*
Cellular automaton-like algorithm for fluid flow simulation.
*
Cartesian grid with $q$ variables per grid point, $f_i(
\b
m{x}, t)$.
*
Each time step is made of:
1.
Collision (local operations only).
2.
Propagation (non-local operations).
*
Notoriously straightforward to parallelize.
## Simulation
## Simulation
0.
Initialization (no Futhark here).
0.
Initialization (no Futhark here).
1.
Collision.
1.
Collision
(on every grid point at the same time)
.
-
Compute $
\r
ho(f_i)$,
-
Compute $
\r
ho(f_i)$,
-
Compute $
\b
m{j}(f_i)$,
-
Compute $
\b
m{j}(f_i)$,
-
Compute $f_i^
\m
athrm{eq}(
\r
ho,
\b
m{j})$,
-
Compute $f_i^
\m
athrm{eq}(
\r
ho,
\b
m{j})$,
...
@@ -166,81 +197,94 @@ int main() {
...
@@ -166,81 +197,94 @@ int main() {
Repeat 1-2 a certain amount of times
Repeat 1-2 a certain amount of times
3.
Get the data and process it.
3.
Get the data and process it.
:::
# Computation of macroscopic moments (1/2)
# The LBM pseudo-code
```
ocaml
let
time_step
[
nx
][
ny
][
nz
][
q
]
(
f
:
[
nx
][
ny
][
nz
][
q
]
f32
)
->
[
nx
][
ny
][
nz
][
q
]
f32
=
let
rho
=
compute_rho
f
let
j
=
compute_j
f
let
feq
=
compute_feq
rho
u
let
fout
=
collide
f
feq
omega
in
stream
fout
```
# Data structures and intrinsics
## Only arrays
::: Simulation
```
ocaml
f
:
[
nx
][
ny
][
nz
][
q
]
f32
--
4
d
array
feq
:
[
nx
][
ny
][
nz
][
q
]
f32
--
4
d
array
rho
:
[
nx
][
ny
][
nz
]
f32
--
3
d
array
j
:
[
nx
][
ny
][
nz
][
d
]
f32
--
4
d
array
```
## Functions
```
ocaml
let
map3d
'
a
[
nx
][
ny
][
nz
]
'
b
(
foo
:
a
->
b
)
(
xs
:
[
nx
][
ny
][
nz
]
a
)
->
[
nx
][
ny
][
nz
]
b
=
map
(
\
xs1
->
map
(
\
xs2
->
map
(
\
xs3
->
foo
xs3
)
xs2
)
xs1
)
xs
```
# Computation of macroscopic moments (1/2)
## LBM equations
## LBM equations
On
**each**
grid point:
\b
egin{equation}
\b
egin{equation}
\r
ho=
\s
um_{i=0}^{q-1}f_i,
\f
orall
\ \b
m{x}.
\r
ho=
\s
um_{i=0}^{q-1}f_i,
\f
orall
\ \b
m{x}.
\e
nd{equation}
\e
nd{equation}
:::
::: Futhark
## Futhark code
## Futhark code
```
ocaml
```
ocaml
map
(
\
fx
->
let
compute_rho
[
nx
][
ny
][
nz
][
q
]
map
(
\
fxy
->
(
f
:
[
nx
][
ny
][
nz
][
q
]
f32
)
->
[
nx
][
ny
][
nz
]
f32
=
map
(
\
fxyz
->
map
3d
(
\
fxyz
->
reduce
(
+
)
0
fxyz
reduce
(
+
)
0
fxyz
)
fxy
)
fx
)
f
)
f
```
```
:::
# Computation of macroscopic moments (2/2)
# Computation of macroscopic moments (2/2)
::: Simulation
## LBM equation
## LBM equation
\b
egin{equation}
\b
egin{equation}
\r
ho
\b
m{u}=
\s
um_{i=0}^{q-1}f_i
\b
m{c}_i,
\f
orall
\ \b
m{x}.
\r
ho
\b
m{u}=
\s
um_{i=0}^{q-1}f_i
\b
m{c}_i,
\f
orall
\ \b
m{x}.
\e
nd{equation}
\e
nd{equation}
:::
::: Futhark
## Futhark code
## Futhark code
```
ocaml
```
ocaml
map
(
\
fx
->
let
compute_j
[
nx
][
ny
][
nz
][
q
]
map
(
\
fxy
->
(
f
:
[
nx
][
ny
][
nz
][
q
]
f32
)
->
[
nx
][
ny
][
nz
][
3
]
f32
=
map
(
\
fxyz
->
map
3d
(
\
fxyz
->
map
(
\
ci
->
map
(
\
ci
->
dotprod
ci
fxyz
dotprod
ci
fxyz
)
(
transpose
c
)
--
intrinsic
)
(
transpose
c
)
--
intrinsic
)
fxy
)
fx
)
f
)
f
```
```
:::
# Computation of the equilibrium distribution
# Computation of the equilibrium distribution
::: Simulation
## LBM equation
## LBM equation
\b
egin{equation}
\b
egin{equation}
f_i^
\m
athrm{eq}=w_i
\r
ho
\l
eft(1+
\f
rac{
\b
m{c}_i
\c
dot
\b
m{u}}{c_s^2}+
\f
rac{1}{2c_s^4}(
\b
m{c}_i
\c
dot
\b
m{u})^2-
\f
rac{1}{2c_s^2}
\b
m{u}^2
\r
ight),
\ \f
orall
\b
m{x},i
f_i^
\m
athrm{eq}=w_i
\r
ho
\l
eft(1+
\f
rac{
\b
m{c}_i
\c
dot
\b
m{u}}{c_s^2}+
\f
rac{1}{2c_s^4}(
\b
m{c}_i
\c
dot
\b
m{u})^2-
\f
rac{1}{2c_s^2}
\b
m{u}^2
\r
ight),
\ \f
orall
\b
m{x},i
\e
nd{equation}
\e
nd{equation}
:::
::: Futhark
## Futhark code
## Futhark code
```
ocaml
```
ocaml
map2
(
\
rho_x
j_x
->
map_3d
(
\
rho_xyz
j_xyz
map2
(
\
rho_xy
j_xy
->
map2
(
\
rho_xyz
j_xyz
->
let
u
=
map
(
\
j_xyzi
->
j_xyzi
/
rho_xyz
)
j_xyz
let
u
=
map
(
\
j_xyzi
->
j_xyzi
/
rho_xyz
)
j_xyz
let
u_sqr
=
dotprod
u
u
let
u_sqr
=
dotprod
u
u
...
@@ -249,56 +293,35 @@ map2(\rho_x j_x ->
...
@@ -249,56 +293,35 @@ map2(\rho_x j_x ->
in
rho_xyz
*
wi
*
in
rho_xyz
*
wi
*
(
1
+
3
*
c_u
+
4
.
5
*
c_u
*
c_u
-
1
.
5
*
u_sqr
)
(
1
+
3
*
c_u
+
4
.
5
*
c_u
*
c_u
-
1
.
5
*
u_sqr
)
)
w
c
)
w
c
)
rho_xy
j_xy
)
(
zip
rho
j
)
)
rho_x
j_x
)
rho
j
```
```
:::
# Collision
# Collision
::: Simulation
## LBM equation
## LBM equation
\b
egin{equation}
\b
egin{equation}
f^
\m
athrm{out}_i=f_i
\l
eft(1-
\o
mega
\r
ight)+
\o
mega f_i^
\m
athrm{eq}.
f^
\m
athrm{out}_i=f_i
\l
eft(1-
\o
mega
\r
ight)+
\o
mega f_i^
\m
athrm{eq}.
\e
nd{equation}
\e
nd{equation}
:::
::: Futhark
## Futhark code
## Futhark code
```
ocaml
```
ocaml
map2
(
\
f_x
feq_x
->
map2_3d
(
\
f_xyz
feq_xyz
->
map2
(
\
f_xy
feq_xy
->
map2
(
\
f_xyz
feq_xyz
->
map2
(
\
f_i
feq_i
->
map2
(
\
f_i
feq_i
->
f_i
*
(
1
.
0
-
omega
)
+
feq_i
*
omega
f_i
*
(
1
.
0
-
omega
)
+
feq_i
*
omega
)
f_xyz
feq_xyz
)
f_xyz
feq_xyz
)
f_xy
feq_xy
)
(
zip
f
feq
)
)
f_x
feq_x
)
f
feq
```
```
:::
# Streaming
# Streaming
::: Simulation
## LBM equation
## LBM equation
\b
egin{equation}
\b
egin{equation}
f_i(
\b
m{x}+
\b
m{c}_i,t+1)=f^
\m
athrm{out}_i(
\b
m{x},t).
f_i(
\b
m{x}+
\b
m{c}_i,t+1)=f^
\m
athrm{out}_i(
\b
m{x},t).
\e
nd{equation}
\e
nd{equation}
:::
::: Futhark
## Futhark code
## Futhark code
```
ocaml
```
ocaml
...
@@ -311,26 +334,25 @@ tabulate_4d nx ny nz q (\x y z ipop ->
...
@@ -311,26 +334,25 @@ tabulate_4d nx ny nz q (\x y z ipop ->
in
f
[
next_x
,
next_y
,
next_z
,
ipop
]
in
f
[
next_x
,
next_y
,
next_z
,
ipop
]
)
)
```
```
:::
# Summary
# Summary
*
A simple yet complete fluid flow simulator.
*
A simple yet complete fluid flow simulator.
*
Lines of readable and easy to debug Futhark code: 110.
*
Lines of readable and
"
easy
"
to debug Futhark code: 110.
*
Single precision, periodic,
D3Q27
only arrays: 250 MLPUS.
*
Single precision, periodic, only arrays: 250 MLPUS.
*Not bad: but we can do better.*
*Not bad: but we can do better.*
# How can we go faster?
# How can we go faster?
*
Arrays are aggressively parallelized: each dimension is flattened.
*
Arrays are aggressively parallelized: each dimension is flattened.
*
For small dimensions it is usually not worth.
*
For small dimensions it is usually not worth
it
.
*
Replace length 3, or length 27 arrays by tuples: better use of GPU
*
Replace length 3, or length 27 arrays by tuples: better use of GPU
architecture or use
`INCREMENTAL_FLATTENING`
.
architecture or use
`INCREMENTAL_FLATTENING`
.
*
`[](a, b, c, ..) -> ([]a, []b, []c, ...)`
{.ocaml} automatically by the compiler.
*
`[](a, b, c, ..) -> ([]a, []b, []c, ...)`
{.ocaml} automatically by the compiler.
*
Result: with a code of 150 lines, we go to 1.5 GLUPS on GPU, 11 MLPUS on a
*
Result: with a code of 150 lines, we go to 1.5 GLUPS on GPU, 11 MLPUS on a
single core, 400 MLUPS on a multi-core machine.
single core, 400 MLUPS on a multi-core machine.
*
All results are the same with CUDA and OpenCL backend
s.
*
Within 10-20
\%
of state of the art optimized GPU code
s.
# Conclusion
# Conclusion
...
@@ -345,7 +367,7 @@ tabulate_4d nx ny nz q (\x y z ipop ->
...
@@ -345,7 +367,7 @@ tabulate_4d nx ny nz q (\x y z ipop ->
# Current and Future Futhark planned developments
# Current and Future Futhark planned developments
##
From Troels Henriksen himself
##
Currently worked on by the core team
*
Multi-GPU: only on a single motherboard (very experimental).
*
Multi-GPU: only on a single motherboard (very experimental).
...
@@ -358,9 +380,10 @@ tabulate_4d nx ny nz q (\x y z ipop ->
...
@@ -358,9 +380,10 @@ tabulate_4d nx ny nz q (\x y z ipop ->
*
Distributed CPU/GPU (MPI) backend.
*
Distributed CPU/GPU (MPI) backend.
*
A cool rendering tool directly from the GPU (OpenGL).
*
A cool rendering tool directly from the GPU (OpenGL).
# Acknowledgments
# Acknowledgments
## By alphabetical order
*
V. Berset,
*
V. Berset,
*
B. Coudray.
*
B. Coudray.
*
M. El Kharroubi,
*
M. El Kharroubi,
...
@@ -368,5 +391,8 @@ tabulate_4d nx ny nz q (\x y z ipop ->
...
@@ -368,5 +391,8 @@ tabulate_4d nx ny nz q (\x y z ipop ->
# Questions?
# Questions?
## Futhark webpage: <https://futhark-lang.org/>
## Thank you for your attention
## Thank you for your attention
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment