The book written by Andrew Webb is certainly the most comprehensive book related to machine learning. I have not been able to find any machine learning topic which is not treated in this book.
According to me, this book is more for a scientific audience for the simplest reason that the presentation gives more importance to equations than to application examples. It does not explain how to program machine learning algorithm but rather which algorithms exist and what is their mathematical background. Every technique is presented first using text and only then mathematical development is shown. Therefore, it is convenient for people preferring textual description as well as the ones preferring equations.
The book is very well structured. Every chapter starts with a textual introduction on the related issue and then describes several techniques to solve it. At the end, specific application examples are given. A large part is then devoted to summary, discussion, recommendations (not always), notes and references, and finally exercises. Topics are covered in a non standard way for people used to data mining practical books. After an introduction, density estimation techniques are explained. Then linear and non-linear discriminant analyzes. It goes on with decision trees, performance and feature selection to finish with clustering and some other additional topics. Although this book is written in a statistical point of view, it is certainly one of the most comprehensive resource for machine learning and data mining.