The loss is computed at the end of the forward propagation in the network to compute the error that will serve as the basis for back propagation. So no, the loss is computed per batch. The T in the formula refers to the number of observations in the batch. The loss is the negative mean of the log probabilities over the items in the batch. There is no user consideration consideration here