# Multinomial Mixture Model for Supermarket Shoppers Segmentation

## A complete tutorial

In my last article, I wrote a detailed explanation of the Gaussian Mixture Model (GMM) and the way it is trained using the Expectation-Maximization (EM) algorithm. This time, I wanted to show that a mixture model is not necessarily a mixture of Gaussian densities. It can be a mixture of any distribution. In this example, we are going to use a mixture of multinomial distributions.

Also, the idea is, for once, not to solely focus on the mathematical and computer science aspects of a data science project but on the business side too. Therefore we are going to use a real-world data set with a concrete application in the marketing domain. It will hopefully allow the reader to get a better vision of why we do the things we do :-). …

# EM of GMM appendix (M-Step full derivations)

The goal here is to derive the closed-form expressions necessary for the update of the parameters during the Maximization step of the EM algorithm applied to GMMs. This material was written as a separate article in order not to overload the main one.

Ok so recall that during the M-Step, we want to maximize the following lower bound with respect to Θ :

The lower bound is defined to be a concave function easy to optimize. So we are going perform a direct optimization procedure, that is, finding the parameters for which the partial derivatives are null. Also as we already said in the main article, we have to fulfill two constraints. The first one is that the sum of mixture weights must sum up to one and the second that the covariance matrix must be positive semidefinite. …

# Gaussian Mixture Models and Expectation-Maximization (A full explanation)

In the previous article, we described the Bayesian framework for linear regression and how we can use latent variables to reduce model complexity.

In this post, we will explain how latent variables can also be used to frame a classification problem, namely the Gaussian Mixture model (or GMM in short) that allows us to perform soft probabilistic clustering.

This model is classically trained by an optimization procedure named the Expectation-Maximization (or EM in short) for which we will have a thorough review. At the end of this article, we will also see why we do not use traditional optimization methods.

This article contains a few mathematical notations and derivations. We are not trying to scare anybody. …

# Latent Variables Probabilistic Modeling

Over the past year, I have taken more and more interest in Bayesian statistics and probabilistic modeling. Along this journey, I have encountered the latent probabilistic models.

We will start by explaining the concept of a latent variable. And to properly understand its benefits we need to make sure that you are familiar with the Bayesian framework for linear regression and this is what this first article is about.

As a quick note, I am no expert in Bayesian statistics. But I wanted to share the current state of my knowledge because I know it can help some people (myself included). … 