Ok thank you very much for this clarification. In my view, this is not a peculiarity of the 2 tower model. Removing the user or item ids from the set of input features can be done in any neural based approched to recommendations. This indeed will make your model focus on the other informations available. But in a lot of cases the rest of the available informations are really not sufficient to make it up for the highly valuable collaborative signal that brings the ids. In that case, you loose way to much performance on the warm start items.