First version finished

b236ed46 · baptiste.coudray · 925e28ef · b236ed46 · b236ed46 · 925e28ef
Verified Commit b236ed46 authored 3 years ago by baptiste.coudray
--- a/.DS_Store
+++ b/.DS_Store
--- a/src/config.yaml
+++ b/src/config.yaml
 ---
 author:
  - Baptiste Coudray
-title: "Futhark-MPI: Distributed High-Performance Computing For People"
+title: "MPI-Futhark: Distributed High-Performance Computing For People"
 smallTitle: "Distributed High-Performance Computing For People"
 institute: University of Applied Sciences and Arts Western Switzerland
 name: Baptiste
@@ -11,8 +11,8 @@ orientation: Software and Complex Systems
 projectMonth: August
 year: 2021
 sensei: "Dr. Orestis Malaspinas"
-frontLogoLegend: "Jeu de la vie" # La légende de l'image de couverture
-frontLogoSourceURL: "https://upload.wikimedia.org/wikipedia/commons/e/e5/Gospers_glider_gun.gif" # La source de l'image de couverture
+frontLogoLegend: "MPI x Futhark with examples" # La légende de l'image de couverture
+frontLogoSourceURL: "Realized by Baptiste Coudray" # La source de l'image de couverture
 workTitle: "Bachelor thesis"
 workFor: "..."
 bibliography: my.bib

--- a/src/figs/dispatch_1d.png
+++ b/src/figs/dispatch_1d.png
--- a/src/figs/dispatch_2d.png
+++ b/src/figs/dispatch_2d.png
--- a/src/figs/envelope_1d.png
+++ b/src/figs/envelope_1d.png
--- a/src/figs/envelope_2d.png
+++ b/src/figs/envelope_2d.png
--- a/src/figs/front-logo.graffle
+++ b/src/figs/front-logo.graffle
--- a/src/figs/front-logo.png
+++ b/src/figs/front-logo.png
--- a/src/figs/lbm_result_and_speedup_gpu.png
+++ b/src/figs/lbm_result_and_speedup_gpu.png
--- a/src/figs/ring.png
+++ b/src/figs/ring.png
--- a/src/iso690.csl
+++ b/src/iso690.csl
@@ -25,12 +25,12 @@
  </info>
  <locale>
    <terms>
-      <term name="no date">[sans date]</term>
+      <term name="no date">[no date]</term>
      <term name="in">in</term>
-      <term name="online">en&#160;ligne</term>
-      <term name="accessed">consulté&#160;le</term>
-      <term name="retrieved">disponible</term>
-      <term name="from">à l'adresse</term>
+      <term name="online">online</term>
+      <term name="accessed">accessed&#160;on</term>
+      <term name="retrieved">retrieved</term>
+      <term name="from">from</term>
    </terms>
  </locale>
  <macro name="author">

--- a/src/my.bib
+++ b/src/my.bib
@@ -7,7 +7,6 @@
 	urldate = {2021-07-22},
 	date = {2021-03-17},
 	langid = {french},
-	note = {Page Version {ID}: 180967726},
 	file = {Snapshot:/Users/baptistecdr/Zotero/storage/GLQPSHE4/index.html:text/html},
 }

@@ -25,24 +24,22 @@
 @inreference{noauthor_amdahls_2021,
 	title = {Amdahl's law},
 	rights = {Creative Commons Attribution-{ShareAlike} License},
-	url = {https://en.wikipedia.org/w/index.php?title=Amdahl%27s_law&oldid=1034193438},
+	url = {https://en.wikipedia.org/w/index.php?title=Amdahl%27s_law},
 	booktitle = {Wikipedia},
 	urldate = {2021-07-22},
 	date = {2021-07-18},
 	langid = {english},
-	note = {Page Version {ID}: 1034193438},
 	file = {Snapshot:/Users/baptistecdr/Zotero/storage/4KJVD4JN/index.html:text/html},
 }

 @inreference{noauthor_gustafsons_2021,
 	title = {Gustafson's law},
 	rights = {Creative Commons Attribution-{ShareAlike} License},
-	url = {https://en.wikipedia.org/w/index.php?title=Gustafson%27s_law&oldid=1031307338},
+	url = {https://en.wikipedia.org/w/index.php?title=Gustafson%27s_law},
 	booktitle = {Wikipedia},
 	urldate = {2021-07-22},
 	date = {2021-06-30},
 	langid = {english},
-	note = {Page Version {ID}: 1031307338},
 	file = {Snapshot:/Users/baptistecdr/Zotero/storage/FGKVKJQD/index.html:text/html},
 }

@@ -89,60 +86,55 @@
 @inreference{noauthor_jeu_2021,
 	title = {Jeu de la vie},
 	rights = {Creative Commons Attribution-{ShareAlike} License},
-	url = {https://fr.wikipedia.org/w/index.php?title=Jeu_de_la_vie&oldid=183190635},
+	url = {https://fr.wikipedia.org/w/index.php?title=Jeu_de_la_vie},
 	booktitle = {Wikipédia},
 	urldate = {2021-07-22},
 	date = {2021-05-23},
 	langid = {french},
-	note = {Page Version {ID}: 183190635},
 	file = {Snapshot:/Users/baptistecdr/Zotero/storage/HDKB5PPW/index.html:text/html},
 }

 @inreference{noauthor_automate_2021,
 	title = {Automate cellulaire},
 	rights = {Creative Commons Attribution-{ShareAlike} License},
-	url = {https://fr.wikipedia.org/w/index.php?title=Automate_cellulaire&oldid=183026782},
+	url = {https://fr.wikipedia.org/w/index.php?title=Automate_cellulaire},
 	booktitle = {Wikipédia},
 	urldate = {2021-07-22},
 	date = {2021-05-18},
 	langid = {french},
-	note = {Page Version {ID}: 183026782},
 	file = {Snapshot:/Users/baptistecdr/Zotero/storage/L5L9W28B/index.html:text/html},
 }

 @inreference{noauthor_programmation_2021,
 	title = {Programmation fonctionnelle},
 	rights = {Creative Commons Attribution-{ShareAlike} License},
-	url = {https://fr.wikipedia.org/w/index.php?title=Programmation_fonctionnelle&oldid=183271341},
+	url = {https://fr.wikipedia.org/w/index.php?title=Programmation_fonctionnelle},
 	booktitle = {Wikipédia},
 	urldate = {2021-07-22},
 	date = {2021-05-26},
 	langid = {french},
-	note = {Page Version {ID}: 183271341},
 	file = {Snapshot:/Users/baptistecdr/Zotero/storage/Z4UFD79Y/index.html:text/html},
 }

 @inreference{noauthor_maximum_2021,
 	title = {Maximum subarray problem},
 	rights = {Creative Commons Attribution-{ShareAlike} License},
-	url = {https://en.wikipedia.org/w/index.php?title=Maximum_subarray_problem&oldid=1030176929},
+	url = {https://en.wikipedia.org/w/index.php?title=Maximum_subarray_problem},
 	booktitle = {Wikipedia},
 	urldate = {2021-07-22},
 	date = {2021-06-24},
 	langid = {english},
-	note = {Page Version {ID}: 1030176929},
 	file = {Snapshot:/Users/baptistecdr/Zotero/storage/LL8NK2KY/index.html:text/html},
 }

 @inreference{noauthor_distributed_2021,
 	title = {Distributed computing},
 	rights = {Creative Commons Attribution-{ShareAlike} License},
-	url = {https://en.wikipedia.org/w/index.php?title=Distributed_computing&oldid=1033553148},
+	url = {https://en.wikipedia.org/w/index.php?title=Distributed_computing},
 	booktitle = {Wikipedia},
 	urldate = {2021-07-23},
 	date = {2021-07-14},
 	langid = {english},
-	note = {Page Version {ID}: 1033553148},
 	file = {Snapshot:/Users/baptistecdr/Zotero/storage/ZF8EB2I9/index.html:text/html},
 }

@@ -154,4 +146,4 @@
 	date = {2010},
 	langid = {english},
 	file = {Chapter 1 Covering Multithreading Basics (Multithreaded Programming Guide):/Users/baptistecdr/Zotero/storage/ASQQ8TRR/index.html:text/html},
-}
\ No newline at end of file
+}
--- a/src/templates/default.latex
+++ b/src/templates/default.latex
@@ -333,7 +333,7 @@ $endif$
    \huge{$title$}\\
    \vspace{.5cm}
    \IfFileExists{figs/front-logo.png}{
-      \includegraphics[scale=1.2]{figs/front-logo.png}\\
+      \includegraphics[scale=0.6]{figs/front-logo.png}\\
    }{
      \vspace{8cm}
    }
@@ -363,7 +363,7 @@ $endif$
 % Illustration URL page
 \vspace*{\fill}
 \IfFileExists{figs/front-logo.png}{
-  $frontLogoLegend$ \url{$frontLogoSourceURL$}
+  Legend and source of the cover picture: $frontLogoLegend$ $frontLogoSourceURL$
 }{
  Cette page a été laissée blanche intentionnellement.
 }

--- a/src/text/00-preface.md
+++ b/src/text/00-preface.md
@@ -10,8 +10,9 @@ I would like to thank the people who helped me during this project:

 # Abstract {-}

+Today, most computers are equipped with (+^GPU). They provide more and more computing cores and have become fundamental embedded high-performance computing tools. In this context, the number of applications taking advantage of these tools seems low at first glance. The problem is that the development tools are heterogeneous, complex, and strongly dependent on the (+GPU) running the code. Futhark is an experimental, functional, and architecture agnostic language; that is why it seems relevant to study it.  It allows generating code allowing a standard sequential execution (on a single-core processor), on (+GPU) (with (+CUDA) and (+OpenCL) backends), on several cores of the same processor (shared memory). To make it a tool that could be used on all high-performance platforms, it lacks support for distributed computing with (+MPI). Nous créons une librairie qui effectue la distribution d'un automate cellulaire sur plusieurs noeuds de calculs via MPI. The update of the cellular automaton is computed via the Futhark language using one of the four available backends (sequential, multicore, OpenCL, and CUDA). Pour valider notre librairie, we implement a cellular automaton in one dimension ((+SCA)), in two dimensions (Game of Life) and three dimensions ((+LBM)). Finally, with the performance tests performed, we obtain an ideal speedup in one and two dimensions with the sequential and multicore backend. With the GPU backend, we obtain an ideal speedup only when the number of tasks equals the number of GPUs.

-\begin{figure} \vspace{.1cm} \begin{center} \includegraphics[width=3.72cm,height=2.4cm]{figs/front-logo.png}
+\begin{figure} \vspace{.1cm} \begin{center} \includegraphics[scale=0.4]{figs/front-logo.png}
 \end{center} \end{figure} \begin{tabular}{ p{3cm} p{1cm} p{1cm} p{6cm} } \multicolumn{1}{l}{Candidate:}& & &
 \multicolumn{1}{l}{Referent teacher:}\\ \multicolumn{1}{l}{\textbf{Baptiste Coudray}} & & &
 \multicolumn{1}{l}{\textbf{Dr. Orestis Malaspinas}} \\ \multicolumn{1}{l}{Field of study: Information Technologies Engineering} & & &

--- a/src/text/01-references.md
+++ b/src/text/01-references.md
@@ -8,11 +8,11 @@

 #### Reference of the URLs {-}

-\begin{tabular}{ p{3cm} p{9cm} } \multicolumn{1}{l}{URL01} &
-\multicolumn{1}{l}{\url{https://upload.wikimedia.org/wikipedia/commons/e/e5/Gospers_glider_gun.gif}}\\
-\multicolumn{1}{l}{URL02} & \multicolumn{1}{l}{\url{https://commons.wikimedia.org/wiki/File:AmdahlsLaw.svg}} \\
-\multicolumn{1}{l}{URL03} & \multicolumn{1}{l}{\url{https://commons.wikimedia.org/wiki/File:Gustafson.png}} \\
-\multicolumn{1}{l}{URL04} & \multicolumn{1}{l}{\url{https://commons.wikimedia.org/wiki/File:Elder_futhark.png}} \\
+\begin{tabular}{ p{3cm} p{9cm} } 
+\multicolumn{1}{l}{URL01} & \multicolumn{1}{l}{\url{https://commons.wikimedia.org/wiki/File:AmdahlsLaw.svg}} \\
+\multicolumn{1}{l}{URL02} & \multicolumn{1}{l}{\url{https://commons.wikimedia.org/wiki/File:Gustafson.png}} \\
+\multicolumn{1}{l}{URL03} & \multicolumn{1}{l}{\url{https://commons.wikimedia.org/wiki/File:Elder_futhark.png}} \\
+\multicolumn{1}{l}{URL04} & \multicolumn{1}{l}{\url{https://futhark-lang.org/images/mss.svg}} \\
 \multicolumn{1}{l}{URL05} & \multicolumn{1}{l}{\url{https://commons.wikimedia.org/wiki/File:Gol-blinker1.png}} \\
 \multicolumn{1}{l}{URL06} & \multicolumn{1}{l}{\url{https://commons.wikimedia.org/wiki/File:Gol-blinker2.png}} \\
 \end{tabular}

--- a/src/text/02-introduction.md
+++ b/src/text/02-introduction.md
@@ -2,26 +2,32 @@

 # Introduction {-}

-Today, most computers are equipped with GPUs. They provide more and more computing cores and have become fundamental embedded high-performance computing tools. In this context, the number of applications taking advantage of these tools seems low at first glance. The problem is that the development tools are heterogeneous, complex, and strongly dependent on the GPU running the code. Futhark is an experimental, functional, and architecture agnostic language; that is why it seems relevant to study it.  It allows generating code allowing a standard sequential execution (on a single-core processor), on GPU (with CUDA and OpenCL backends), on several cores of the same processor (shared memory). To make it a tool that could be used on all high-performance platforms, it lacks support for distributed computing. This work aims to develop a library that can port any Futhark code to an MPI library with as little effort as possible.
+Today, most computers are equipped with (+^GPU). They provide more and more computing cores and have become fundamental embedded high-performance computing tools. In this context, the number of applications taking advantage of these tools seems low at first glance. The problem is that the development tools are heterogeneous, complex, and strongly dependent on the (+GPU) running the code. Futhark is an experimental, functional, and architecture agnostic language; that is why it seems relevant to study it.  It allows generating code allowing a standard sequential execution (on a single-core processor), on (+GPU) (with (+CUDA) and (+OpenCL) backends), on several cores of the same processor (shared memory). To make it a tool that could be used on all high-performance platforms, it lacks support for distributed computing. This work aims to develop a library that can port any Futhark code to an (+MPI) library with as little effort as possible.

-To achieve that, we introduce the interest of parallelization --, then what is MPI and Futhark. We decide to implement a library that can parallelize cellular automaton in, one, two or three dimensions. By adding Futhark on top of MPI, the programmer will have the possibilities to compile his code in :
+To achieve that, we introduce the meaning of distributed high-performance computing, then what is (+MPI) and Futhark. We decide to implement a library that can parallelize cellular automaton in, one, two or three dimensions. By adding Futhark on top of (+MPI), the programmer will have the possibilities to compile his code in :

 * parallelized-sequential mode,
 * parallelized-multicore mode,
 * parallelized-OpenCL mode,
 * parallelized-CUDA mode.

-Finally, we used this library by implementing a cellular automata in each dimension, and we perform a benchmark to ensure that each cellular automata scales correctly in these four modes.
+Finally, we used this library by implementing a cellular automata in each dimension:

-The leading resources we used to carry out this project were Futhark and MPI user guide. We also exchanged with Futhark creator Troels Henriksen.
+* a (+SCA) in one dimension,
+* the Game of Life in two dimensions,
+* the (+LBM) in three dimensions.
+
+We perform a benchmark to ensure that each cellular automata scales correctly in the four modes.
+
+The leading resources we used to carry out this project were Futhark and (+MPI) user guide. We also exchanged with Futhark creator Troels Henriksen.

 ## Working method {-}

 During this project, we use Git and put the source code on the Gitlab platform of HEPIA:

-* https://gitedu.hesge.ch/baptiste.coudray/projet-de-bachelor
-  * Source code of the library with usage examples.
-* https://gitedu.hesge.ch/baptiste.coudray/projet-de-semestre/-/tree/report
-  * Source code of this report
+* Source code of the library with usage examples
+  * https://gitedu.hesge.ch/baptiste.coudray/projet-de-bachelor
+* Source code of this report
+  * https://gitedu.hesge.ch/baptiste.coudray/projet-de-semestre/-/tree/report

 \pagebreak
--- a/src/text/03-programmation-parallele.md
+++ b/src/text/03-programmation-parallele.md
@@ -14,7 +14,7 @@ In parallel computing, two important laws give the theoretical speedup that can

 \pagebreak

-\cimg{figs/amdahls-law.png}{scale=0.6}{Amdahl's law}{Source: Taken from https://commons.wikimedia.org/, ref. URL02}
+\cimg{figs/amdahls-law.png}{scale=0.6}{Amdahl's law}{Source: Taken from https://commons.wikimedia.org/, ref. URL01}

 Amdahl's law states that the program's overall speed is limited by the code that cannot be parallelized. Indeed, there will almost always be a sequential part in a code that cannot be parallelized. There is, therefore, a relationship between the ratio of parallelizable code and the overall execution speed of the program [@noauthor_amdahls_2021].

@@ -27,7 +27,7 @@ In the graph above, we notice that if:

 \pagebreak

-\cimg{figs/gustafson-law.png}{scale=0.75}{Gustafson–Barsis's law}{Source: Taken from https://commons.wikimedia.org/, ref. URL03}
+\cimg{figs/gustafson-law.png}{scale=0.75}{Gustafson–Barsis's law}{Source: Taken from https://commons.wikimedia.org/, ref. URL02}

 Gustafson's law says that the more significant the amount of data to be processed, the more advantageous it is to use many processors. Thus, the acceleration is linear, as can be seen on the graph [@noauthor_gustafsons_2021].
 On the graph, we notice, for example, that with a code that is 90% parallelized, we have a speedup of at least x100 with 120 processors, where Amdahl's law estimated a maximum speedup of x10 with 512 processors. Gustafson's law is therefore much more optimistic in terms of performance gain.

--- a/src/text/04-mpi.md
+++ b/src/text/04-mpi.md
 # Message Passing Interface

-In order to realize parallel programming, the standard (+^MPI) was created in 1993-1994 to standardize the passage of messages between several computers or in a computer with several processors/cores [@noauthor_message_2021]. (+^MPI) is, therefore, a communication protocol and not a programming language. Currently, the latest version of (+^MPI) is 4.0 which approved in 2021. There are several implementations of the standard:
+In order to realize parallel programming, the standard (+MPI) was created in 1993-1994 to standardize the passage of messages between several computers or in a computer with several processors/cores [@noauthor_message_2021]. (+MPI) is, therefore, a communication protocol and not a programming language. Currently, the latest version of (+MPI) is 4.0 which approved in 2021. There are several implementations of the standard:

 * MPICH, which support for the moment, MPI 3.1,
 * Open MPI, which support, for the moment, MPI 3.1

-We use Open MPI throughout this project on the cluster of the (+^HES-GE).
+We use Open MPI throughout this project on the cluster of the (+HES-GE).

 \pagebreak

 ## Example

-To understand the basis of (+^MPI), let us look at an example mimicking a *token ring* network [@kendall_mpi_2018]. This type of network forces a process to send a message to the message in the console, for example, only if it has the token in its possession. Moreover, once it has emitted its message, the process must transmit the token to its neighbor.
+To understand the basis of (+MPI), let us look at an example mimicking a *token ring* network [@kendall_mpi_2018]. This type of network forces a process to send a message to the message in the console, for example, only if it has the token in its possession. Moreover, once it has emitted its message, the process must transmit the token to its neighbor.

 \cimg{figs/ring.png}{scale=0.4}{Imitation of a network in \textit{token ring}}{Source: Created by Baptiste Coudray}

@@ -67,8 +67,8 @@ mpicc ring.c -o ring
 mpirun -n 5 ./ring
 ```

-To compile a (+^MPI) program, you have to go through the `mpicc` program, which is a wrapper
-around (+^GCC). Indeed, `mpicc` automatically adds the correct compilation parameters to the (+^GCC) program.
+To compile a (+MPI) program, you have to go through the `mpicc` program, which is a wrapper
+around (+GCC). Indeed, `mpicc` automatically adds the correct compilation parameters to the (+GCC) program.
 Next, our compiled program must be run through `mpirun` to distribute our program to compute nodes. Finally, the `-n` parameter is used to specify the number of processes to run.

 ```

--- a/src/text/05-futhark.md
+++ b/src/text/05-futhark.md
 # Introduction to the language Futhark

-\cimg{figs/futhark.png}{scale=0.60}{Futhark}{Source: Taken from https://commons.wikimedia.org/, ref. URL04}
+\cimg{figs/futhark.png}{scale=0.60}{Futhark}{Source: Taken from https://commons.wikimedia.org/, ref. URL03}

-Futhark is a purely functional programming language for producing parallelizable code on (+^CPU) or (+^GPU). It was designed by Troels Henriksen, Cosmin Oancea and Martin Elsman at the University of Copenhagen.
+Futhark is a purely functional programming language for producing parallelizable code on (+CPU) or (+GPU). It was designed by Troels Henriksen, Cosmin Oancea and Martin Elsman at the University of Copenhagen.
 The main goal of Futhark is to write generic code that can compile into either:

-* (+^OpenCL),
-* (+^CUDA),
-* multi-threaded (+^POSIX) C,
+* (+OpenCL),
+* (+CUDA),
+* multi-threaded (+POSIX) C,
 * sequential C,
 * sequential Python.

-Although a Futhark code can compile into an executable, this feature reserves for testing purposes because there is no (+^IO). Thus, the main interest is to write particular functions that you would like to speed up thanks to parallel programming and compile in library mode to use in a C program.
+Although a Futhark code can compile into an executable, this feature reserves for testing purposes because there is no (+IO). Thus, the main interest is to write particular functions that you would like to speed up thanks to parallel programming and compile in library mode to use in a C program.

 \pagebreak

-To see the performance of Futhark, Here is an example from the Futhark site that compares the resolution time of the (+^MSS) problem. The (+^MSS) problem is the task of finding a contiguous subarray with the largest sum, within a given one-dimensional array A[1...n] of numbers [@noauthor_maximum_2021].
+To see the performance of Futhark, Here is an example from the Futhark site that compares the resolution time of the (+MSS) problem. The (+MSS) problem is the task of finding a contiguous subarray with the largest sum, within a given one-dimensional array A[1...n] of numbers [@noauthor_maximum_2021].

 \cimg{figs/mss_bench.png}{scale=0.35}{MSS runtime (lower is better)}{Source: Taken from https://futhark-lang.org/performance.html, ref. URL04}

-This graph shows performance of a maximum segment sum implementation in Futhark and Thrust (a C++ library developed by NVIDIA for (+^GPU) programming). The sequential runtime is for Futhark code compiled to sequential (+^CPU) code and the Futhark runtime is for code compiled to (+^CUDA) [@henriksen_gotta_2021]. As we can see, the Futhark version is much faster than the sequential and Thrust versions, which justify using this language in this project.
+This graph shows performance of a maximum segment sum implementation in Futhark and Thrust (a C++ library developed by NVIDIA for (+GPU) programming). The sequential runtime is for Futhark code compiled to sequential (+CPU) code and the Futhark runtime is for code compiled to (+CUDA) [@henriksen_gotta_2021]. As we can see, the Futhark version is much faster than the sequential and Thrust versions, which justify using this language in this project.

 \pagebreak

@@ -41,13 +41,13 @@ echo 12 | ./fact

 To compile the Futhark code, we have to specify a backend; this one allows us to compile our code in:

-* (+^OpenCL) (opencl, pyopencl),
-* (+^CUDA) (cuda),
-* multi-thread (+^POSIX) C (multicore),
+* (+OpenCL) (opencl, pyopencl),
+* (+CUDA) (cuda),
+* multi-thread (+POSIX) C (multicore),
 * sequential C (c),
 * Python sequential (python).

-Here we compile in (+^OpenCL) to run the program on the graphics card, and we run the program with the number 12 as the parameter.
+Here we compile in (+OpenCL) to run the program on the graphics card, and we run the program with the number 12 as the parameter.

 ```
 479001600i32
@@ -69,7 +69,7 @@ Functions that can be used in C code must be defined with the `entry` keyword. T
 futhark opencl --lib fact.fut
 ```

-Then you have to compile the Futhark code in library mode and specify the backend. Here, the factorial program is compiled in (+^OpenCL). Finally, it generates a `fact.h` and `fact.c` file, which can be included in a C program.
+Then you have to compile the Futhark code in library mode and specify the backend. Here, the factorial program is compiled in (+OpenCL). Finally, it generates a `fact.h` and `fact.c` file, which can be included in a C program.

 ```c
 #include <stdio.h>

--- a/src/text/06-mpi-x-futhark.md
+++ b/src/text/06-mpi-x-futhark.md
-# Automate cellulaire
+# Cellular Automaton

 A cellular automaton consists of a regular grid of cells, each in one of a finite number of states. The grid can be in any finite number of dimensions. For each cell, a set of cells called its neighborhood is defined relative to the specified cell. An initial state (time $t = 0$) is selected by assigning a state for each cell. A new generation is created (advancing t by 1), according to some fixed rule (generally, a mathematical function) that determines the new state of each cell in terms of the current state of the cell and the states of the cells in its neighborhood. Typically, the rule for updating the state of cells is the same for each cell and does not change over time [@noauthor_automate_2021].

@@ -44,6 +44,8 @@ In a one-dimensional Cartesian topology, we notice that the rows can communicate

 In a two-dimensional Cartesian topology, we notice that rows can communicate directly with their left, right, top, and bottom neighbors. When a row needs to communicate with its diagonal neighbor, we use the default communicator (`MPI_COMM_WORLD`) to communicate directly with each other without going through a neighbor.

+\pagebreak
+
 ### Three dimensions

 \cimg{figs/communication_3d.png}{scale=0.60}{Example of Cartesian virtual topology in three dimensions}{Source: Created by Baptiste Coudray}
@@ -56,35 +58,50 @@ The cellular automaton is shared as equally possible among the available tasks t

 #### One dimension

-\cimg{figs/futhark.png}{scale=0.60}{Example of sharing a cellular automaton in one dimension}{Source: Created by Baptiste Coudray}
+\cimg{figs/dispatch_1d.png}{scale=0.60}{Example of sharing a cellular automaton in one dimension}{Source: Created by Baptiste Coudray}

 In this example, a cell automaton of dimension one, size 8, is split between three processes. As the division of the cellular automaton is not an integer, rank two have only two cells, unlike the others, which have three.

 #### Two dimensions
-\cimg{figs/dispatch_1d.png}{scale=0.60}{Example of sharing a cellular automaton in two dimensions}{Source: Created by Baptiste Coudray}
+\cimg{figs/dispatch_2d.png}{scale=0.60}{Example of sharing a cellular automaton in two dimensions}{Source: Created by Baptiste Coudray}

 In this example, the cellular automaton is in two dimensions and of size $9 \times 9$. With four tasks available, it can be separated into four sub-matrices of $3 \times 3$.

 #### Three dimensions

 In three dimensions, the cellular automaton partitioning representation is challenging to make understandable. Thus, based on the two-dimensional partitioning, each task divides the third dimension.
-For example, a cellular automaton of size $4 \times 4 \times 4$, each process has a piece of size $2 \times 2 \times 2$.
+For example, a cellular automaton of size $4 \times{} 4 \times{} 4$, each process has a piece of size $2 \times{} 2 \times{} 2$.

 ### Envelope

 The envelope of a chunk represents the missing neighbours of the cells at the extremities of the chunk. These missing cells are needed to compute the next iteration of the chunk of the cellular automaton that the process has.

+\pagebreak
+
 #### One dimension
-\cimg{figs/dispatch_1d.png}{scale=0.60}{Example of the envelope of a chunk in one dimension}{Source: Created by Baptiste Coudray}
+\cimg{figs/envelope_1d.png}{scale=0.60}{Example of the envelope of a chunk in one dimension}{Source: Created by Baptiste Coudray}

-In one dimension, the Moore neighborhood of a cell includes the west-neighbor and the east-neighbor. Using the previously described one-dimensional cellular automaton, we notice that the envelope of $R_{n}$ includes the last cell of $R_{(n-1) \% N}$ and the first cell of $R_{(n+1) \% N}$. Thus, the ranks exchange data via MPI using the Cartesian virtual topology.
+In one dimension, the Moore neighborhood of a cell includes the west-neighbor and the east-neighbor. We notice that the envelope of $R_{n}$ includes the last cell of $R_{(n-1)\:\%\:N}$ and the first cell of $R_{(n+1)\:\%\:N}$. For example, the envelope for R1 is the west-neighbor (three) and the east neighbor (seven). Thus, the ranks exchange data via MPI using the Cartesian virtual topology.
+
+\pagebreak

 #### Two dimensions

+\cimg{figs/envelope_2d.png}{scale=0.60}{Example of the envelope of a chunk in two dimensions}{Source: Created by Baptiste Coudray}

-#### Three dimensions
+Using the two-dimensional cellular automaton described above, the chunk envelope of R0 requires eight communications. This example uses a Cartesian topology of size $2\times 2$ (m\times n), the neighbors are recovered as follows:

-En troisième dimension, on rajoute la profondeur de devant et la profondeur de derrière 
+1. `North West Neighbors` are sent by $R_{ ((y - 1)\:\%:m,\:(x - 1)\%:n) }$,
+2. `North Neighbors`, are sent by $R_{ ((y - 1)\%:m,\:x) }$,
+3. `North East Neighbors`, are sent by $R_{ ((y - 1)\:\%:m,\:(x + 1)\:\%:n) }$,
+4. `East Neighbors`, are sent by $R_{ (y,\:(x + 1)\%:n) }$,
+5. `South East Neighbors`, are sent by $R_{ ((y + 1)\:\%:m,\:(x + 1)\%:n) }$,
+6. `South Neighbors`, are sent by $R_{ ((y + 1)\%:m,\:x) }$,
+7. `South West Neighbors`, are sent by $R_{ ((y + 1)\%:m,\:(x - 1)\%:n) }$,
+8. `West Neighbors`, are sent by $R_{ (y,\:(x - 1)\%:n) }$.
+
+#### Three dimensions

+With a three-dimensional cellular automaton, the envelope of a chunk requires 26 MPI communications.

 \pagebreak