Member-only story
Building a multi-stage recommendation system (part 2.1)
Heavy ranking model strategy and design — Multi-gate Mixture-of-Experts
In the first post of this series, we learned about the importance of adopting a multi-stage recommendation strategy, especially for companies with large item catalogs. We learned what candidate generation was and implemented one of its famous representatives: the two-tower model. In this post, we move on to the next step in the pipeline which is the ranker. Its job is, given a few hundred candidates issued by the generator(s), to order them properly (rank them) so that the most relevant items are displayed at the top.
We’ll first start with a brief introduction to what ranking entails in a machine learning task. We will then describe what we mean by heavy ranking strategy. In that sense, we’ll describe an advanced architecture introduced by YouTube at RecSys ’19 involving multitask learning and mixture of experts. Finally, we’ll see how JD.com raised the bar even higher in their CIKM ’20 paper by introducing transformer blocks into YouTube’s architecture to additionally exploit different user sequences of actions. In the following blog post, we’ll implement a multi-gate-mixture-of-experts (MMoE) architecture and apply it to the H&M Kaggle competition dataset.
Learning to rank
Ranking is a task that differs from the traditional classification or regression settings. Our goal here is not just to predict the right classes or the right numbers but to predict the…