diff --git a/src/figs/dispatch_1d.png b/src/figs/dispatch_1d.png
index f22af69b233fca6fbff4f59e31b697aa92a882fe..f10bf323182236257cd18b9117eae4eb7fbd7b4b 100644
Binary files a/src/figs/dispatch_1d.png and b/src/figs/dispatch_1d.png differ
diff --git a/src/figs/distributed_systems.graffle b/src/figs/distributed_systems.graffle
new file mode 100644
index 0000000000000000000000000000000000000000..70d2c82d9276da13979389a9391ee15617388796
Binary files /dev/null and b/src/figs/distributed_systems.graffle differ
diff --git a/src/figs/distributed_systems.png b/src/figs/distributed_systems.png
new file mode 100644
index 0000000000000000000000000000000000000000..475b7299b4ea56e4a85359b1e8e05080638d3cc2
Binary files /dev/null and b/src/figs/distributed_systems.png differ
diff --git a/src/figs/ring.png b/src/figs/ring.png
index f9f303abf01daa6721190170eafc922f0cea3a5b..379f3ddb29c887b3d04c0f687ed80e1f99b03122 100644
Binary files a/src/figs/ring.png and b/src/figs/ring.png differ
diff --git a/src/text/00-preface.md b/src/text/00-preface.md
index 587bde3f44c845262cc05029a17854da53679a27..def6c0d1ba5793701fe1d0cfa463f7599ecfe804 100644
--- a/src/text/00-preface.md
+++ b/src/text/00-preface.md
@@ -10,12 +10,18 @@ I would like to thank the people who helped me during this project:
 
 # Abstract {-}
 
-Today, most computers are equipped with (+^GPU). They provide more and more computing cores and have become fundamental embedded high-performance computing tools. In this context, the number of applications taking advantage of these tools seems low at first glance. The problem is that the development tools are heterogeneous, complex, and strongly dependent on the (+GPU) running the code. Futhark is an experimental, functional, and architecture agnostic language; that is why it seems relevant to study it.  It allows generating code allowing a standard sequential execution (on a single-core processor), on (+GPU) (with (+CUDA) and (+OpenCL) backends), on several cores of the same processor (shared memory). To make it a tool that could be used on all high-performance platforms, it lacks support for distributed computing with (+MPI). We create a library which perform the distribution of a cellular automaton on multiple compute nodes through MPI. The update of the cellular automaton is computed via the Futhark language using one of the four available backends (sequential, multicore, OpenCL, and CUDA). In order to validate our library, we implement a cellular automaton in one dimension ((+SCA)), in two dimensions (Game of Life) and three dimensions ((+LBM)). Finally, with the performance tests performed, we obtain an ideal speedup in one and two dimensions with the sequential and multicore backend. With the GPU backend, we obtain an ideal speedup only when the number of tasks equals the number of GPUs.
+Today, most computers are equipped with (+^GPU). They provide more and more computing cores and have become fundamental embedded high-performance computing tools. In this context, the number of applications taking advantage of these tools seems low at first glance. The problem is that the development tools are heterogeneous, complex, and strongly dependent on the (+GPU) running the code. Futhark is an experimental, functional, and architecture agnostic language; that is why it seems relevant to study it.  It allows generating code allowing a standard sequential execution (on a single-core processor), on (+GPU) (with (+CUDA) and (+OpenCL) backends), on several cores of the same processor (shared memory). To make it a tool that could be used on all high-performance platforms, it lacks support for distributed computing (+MPI). We create a library that distributes a cellular automaton on multiple compute nodes through (+MPI). The update of the cellular automaton is computed via the Futhark language using one of the four available backends (sequential, multicore, (+OpenCL), and (+CUDA)). In order to test our library, we implement a cellular automaton in one dimension ((+SCA)), in two dimensions (Game of Life), and three dimensions ((+LBM)). Finally, with the performance tests performed, we obtain an ideal speedup in one and two dimensions with the sequential and multicore backend, but with (+LBM), we obtain a maximum of x42 with 128 tasks. When using the GPU backends, we obtain an ideal speedup for the three cellular automata. Parallel computing shows better performance compared to sequential or concurrent computing. For example, with the Game of Life, we are up to 15 times faster.
 
-\begin{figure} \vspace{.1cm} \begin{center} \includegraphics[scale=0.4]{figs/front-logo.png}
-\end{center} \end{figure} \begin{tabular}{ p{3cm} p{1cm} p{1cm} p{6cm} } \multicolumn{1}{l}{Candidate:}& & &
-\multicolumn{1}{l}{Referent teacher:}\\ \multicolumn{1}{l}{\textbf{Baptiste Coudray}} & & &
-\multicolumn{1}{l}{\textbf{Dr. Orestis Malaspinas}} \\ \multicolumn{1}{l}{Field of study: Information Technologies Engineering} & & &
-\multicolumn{1}{l}{} \\ \end{tabular}
+\begin{figure} \vspace{.1cm} \begin{center} \includegraphics[scale=0.22]{figs/front-logo.png}
+\end{center} \end{figure} 
+
+\begin{table}[ht!]
+\begin{tabular}{lllll}
+Candidate:                                           & Referent teacher:               &  &  &  \\
+\textbf{Baptiste COUDRAY}                            & \textbf{Dr. Orestis MALASPINAS} &  &  &  \\
+Field of study: Information Technologies Engineering & \multicolumn{1}{r}{}            &  &  &  \\
+&                                 &  &  &
+\end{tabular}
+\end{table}
 
 \pagebreak
diff --git a/src/text/03-programmation-parallele.md b/src/text/03-programmation-parallele.md
index bf47a26487eee44de90a7d0c8fa5a89eb83bc48f..9653b04716bb61a7852593b410c95915d1cef97a 100644
--- a/src/text/03-programmation-parallele.md
+++ b/src/text/03-programmation-parallele.md
@@ -1,13 +1,17 @@
 # Distributed High-Performance Computing
 
-Distributed systems are groups of networked computers that share a common goal. Distributed systems are used to increase computing power and solve a complex problem faster than with a single machine. Thus, when the problem is distributed, it is solved more quickly than sequential, concurrent, or parallel computing [@noauthor_distributed_2021].
+\cimgl{figs/distributed_systems.png}{width=\linewidth}{A Distributed High-Performance Computing}{Source: Created by Baptiste Coudray}{fig:dd-sys}
 
-Sequential computation consists of executing a processing step by step, where each operation is triggered only when the previous operation is completed, even when the two operations are independent.
+As we can see in \ref{fig:dd-sys}, distributed systems are groups of networked computers that share a common goal. They are used to increasing computing power and solve a complex problem faster than a single machine [@noauthor_distributed_2021]. In order to do that the problem's data are divided along each computer which can be done by communicating with each other via message passing. Each computer executes the same program (which is a distributed program) but on a different data. The algorithm is applied using one of this three computing methods:
 
-Oracle's multithreading programming guide [@oracle_chapter_2010], define *concurrency* as a state where each task is executed independently with time-slicing. A performance gain is noticeable when tasks are most independent of others because they do not have to wait for the progress of another task.
+1. sequential computing,
+2. concurrent computing,
+3. or parallel computing.
 
-Finally, parallel computing exploits the computing power of a graphics card thanks to the thousands of cores it has. This allows a gain in performance compared to the previously described calculation methods because the operations are performed simultaneously on the different cores.
+With sequential computation, the algorithm is executed step by step, each operation is triggered only when the previous operation is completed, even when the two operations are independent.\newline
+With concurrent computing, the problem's data are once again split into smaller parts in order to be shared with the threads available on the processor. Each thread applies independently with time-slicing the algorithm on his set of data. A performance gain is noticeable when tasks are most independent of others because they do not have to wait for the progress of another task (thread) [@oracle_chapter_2010].\newline
+With parallel computing, data are also split again and the algorithm is applied simultaneously on the multiple processors available. Generally, we use the (+GPU) because it contains a thousand cores while a (+CPU) contains only a hundred. Thus, the primary goal of parallel computing is to increase available computation power for faster application processing and problem-solving.
 
-So, _Distributed High-Performance Computing_ means distributing a program on multiple networked computers and programming the software in sequential, concurrent or parallel computing.
+So, _Distributed High-Performance Computing_ means distributing a program on multiple networked computers and executing the algorithm using sequential, concurrent or parallel computing.
 
 \pagebreak
diff --git a/src/text/09-lattice-boltzmann.md b/src/text/09-lattice-boltzmann.md
index 12e8f1d70e54457d17f52411f4bf70c71d4ddde0..603844abc3c69537d334e8c649f4167cebc34722 100644
--- a/src/text/09-lattice-boltzmann.md
+++ b/src/text/09-lattice-boltzmann.md
@@ -117,7 +117,7 @@ Table: Results for the distributed-(+CUDA) version of (+LBM)
 
 This table contains the results obtained by using the backend `cuda` of Futhark.
 
-\cimgl{figs/lbm_result_and_speedup_gpu.png}{width=\linewidth}{Benchmarks of the LBM in distributed-OpenCL/CUDA}{Source: Realized by Baptiste Coudray}{fig:bench-gpu-gol}
+\cimgl{figs/lbm_result_and_speedup_gpu.png}{width=\linewidth}{Benchmarks of the LBM in distributed-OpenCL/CUDA}{Source: Realized by Baptiste Coudray}{fig:bench-gpu-lbm}
 
 Like the other benchmarks (\ref{fig:bench-gpu-sca}, \ref{fig:bench-gpu-gol}), there is very little difference between the (+OpenCL) and (+CUDA) versions (computation time and speedup). We get a more than ideal speedup with 2, 4, and 8 tasks/(+^GPU) (x2.1, x5.2, and x10.2, respectively). Finally, we notice that parallel computation is up to 3 times faster than sequential/concurrent computation when executing with a single task/graphical card.
 
diff --git a/src/text/10-conclusion.md b/src/text/10-conclusion.md
index b2a1013dc4e3e066bb69a410b36fe09e5729d717..d137e4106bbccbea4c8acf85bdb81363a1113e35 100644
--- a/src/text/10-conclusion.md
+++ b/src/text/10-conclusion.md
@@ -1,7 +1,9 @@
 # Conclusion
 
 In this project, we created a library allowing to distribute a one, two or three dimensional cellular automaton on several computation nodes via (+MPI). Thanks to the different Futhark backends, the update of the cellular automaton can be done in sequential, concurrent or parallel computation. Thus, we compared these different modes by implementing a cellular automaton in one dimension ((+SCA)), in two dimensions (Game of Life) and in three dimensions ((+LBM)). Benchmarks for each backend were performed to verify the scalability of the library. We obtained ideal speedups with the cellular automata in one and two dimensions and with the use of the sequential and multicore Futhark backend. With these two backends and a three-dimensional cellular automaton, we had a maximum speedup of x41 with 128 tasks. Concerning the (+OpenCL) and (+CUDA) backends, they show no difference in performance between them and for the three cellular automata, the speedup is ideal. Parallel computing has consistently shown better performance compared to sequential or simultaneous computing. For example, with the Game of Life, we are up to 15 times faster.
-During this work, I learn the importance to make unit tests to valid my implementation. Indeed, I was able to narrowing down multiple bugs that I made and make sure that my library was still functioning when I was adding cellular automaton in two and three dimension.
-Finally, the library can be improved to obtain an ideal speedup in three dimensions with the CPU backends. Moreover, the support of the Von Neumann neighborhood to manage other cellular automata.
+
+During this work, I learnt the importance to make unit tests to valid my implementation. Indeed, I was able to narrowing down multiple bugs that I made and make sure that my library was still functioning when I was adding cellular automaton in two and three dimension.
+
+The library can be improved to obtain an ideal speedup in three dimensions with the CPU backends. Moreover, the support of the Von Neumann neighborhood to manage other cellular automata.
 
 \pagebreak