@@ -169,10 +169,20 @@ We see that the learning rate of 0.05 is the best because it is the one that con
...
@@ -169,10 +169,20 @@ We see that the learning rate of 0.05 is the best because it is the one that con
### Analyse the dependency of the final error rate on the number of epochs
### Analyse the dependency of the final error rate on the number of epochs
We tried many value of epochs and we saw that after ~90 epoches for the learning rate 0.05, the error stop decreasing for the test set. For 150 epoches we have :

And for 1000 epoches we have :

We also tried it for the learning rate 10 (to see the effect of a big learning rate) and we saw that it stop almost instantly and after only 1 epoches we are stuck at the same error rate but it found a good value almost instantly.
We can say that we need a higher number of epoches to have a good results with small learning rate but we need a lower number of epoches with a big learning rate. The goal is to have the maximum learning rate possible to reach the minimum error rate in the minimum number of epoches but if our learning rate is too big we will oscillate around the minimum error rate and we will never reach it.
### Plot a histogram of the weights finally obtained from learing. A strong peak at zero remains. Why ?
### Plot a histogram of the weights finally obtained from learing. A strong peak at zero remains. Why ?
We see that if we plot an histogramm of the weights finally obtained we have :
We see that if we plot an histogramm of the weights finally obtained we have :