CatBoost is a “relatively” new package developed by Yandex researchers. It is pretty popular right now, especially in Kaggle competitions where it generally outperforms other gradient tree boosting libraries.

Among other ingredients, one of the very cool feature of CatBoost is that it handles categorical variables out of the box (hence the name of the algorithm).

When using an implementation, it is important to really understand how it works under the hood. This is the goal of this article. We are going to take an in-depth look at this technique called **Ordered Target Statistics.** It is presumed that you have…

This article is intended for either students trying to break into data science or professionals in need of a refresher on boosting and gradient boosting. There are already quite a lot of materials regarding this topic on the web but not many include graphical visualization of the learning process.

So we are going to start with the Boosting principle and end with Gradient Boosting. We are also going to understand the transition from the former to the latter. We will start with graphical intuitions and then explain the algorithm written with pseudo-code.

Gradient Tree Boosting is one of the best…

In my last article, I wrote a detailed explanation of the Gaussian Mixture Model (GMM) and the way it is trained using the Expectation-Maximization (EM) algorithm. This time, I wanted to show that a mixture model is not necessarily a mixture of Gaussian densities. It can be a mixture of any distribution. In this example, we are going to use a mixture of multinomial distributions.

Also, the idea is, for once, not to solely focus on the mathematical and computer science aspects of a data science project but on the business side too. Therefore we are going to use a…

This article is an extension of “Gaussian Mixture Models and Expectation-Maximization (A full explanation)”. If you didn’t read it, this article might not be very useful.

The goal here is to derive the closed-form expressions necessary for the update of the parameters during the Maximization step of the EM algorithm applied to GMMs. This material was written as a separate article in order not to overload the main one.

Ok so recall that during the M-Step, we want to maximize the following lower bound with respect to Θ :

The lower bound is defined to be a concave function easy…

In the previous article, we described the Bayesian framework for linear regression and how we can use latent variables to reduce model complexity.

In this post, we will explain how latent variables can also be used to frame a classification problem, namely the Gaussian Mixture model (or GMM in short) that allows us to perform soft probabilistic clustering.

This model is classically trained by an optimization procedure named the Expectation-Maximization (or EM in short) for which we will have a thorough review. At the end of this article, we will also see why we do not use traditional optimization methods.

…

Over the past year, I have taken more and more interest in **Bayesian statistics** and **probabilistic modeling**. Along this journey, I have encountered the **latent probabilistic models**.

We will start by explaining the concept of a **latent variable**. And to properly understand its benefits we need to make sure that you are familiar with the **Bayesian framework for linear regression** and this is what this first article is about.

As a quick note, I am no expert in Bayesian statistics. But I wanted to share the current state of my knowledge because I know it can help some people (**myself…**

I am a machine learning engineer at Armis and I love to learn and share my passion for data science — https://www.linkedin.com/in/adrien-biarnes-81975717