We perform, in-batch negative sampling. There isn't any real "sampling" per say.

Srikesh Srinivas

Aug 29, 2024

What we need to compute the loss are logits of the positives and negatives along with the indices of the positives.

The positive logits are on the diagonal of the logits matrix and the negatives off-diagonal.

The positive indices are simply [0, 1, 2, ..., batch_size]

Sign up to discover human stories that deepen your understanding of the world.

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

I am a senior ML engineer working on recommender systems — https://www.linkedin.com/in/adrien-biarnes-81975717

Write a response

What are your thoughts?

Also publish to my profile

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams