The loss is computed at the end of the forward propagation in the network to compute the error that will serve as the basis for back propagation. So no, the loss is computed per batch. The T in the formula refers to the number of observations in the batch. The loss is the negative mean of the log probabilities over the items in the batch. There is no user consideration consideration here

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Adrien Biarnes
Adrien Biarnes

Responses (1)

Write a response