Sofar we've been discussing neural systems that train by predicting the next token or surrounding tokens. It had a diagram that looked like this:
data:image/s3,"s3://crabby-images/06600/06600332d18b52ec87a8ea9ee1cd6fc01388a29c" alt=""
But we're free to pick whatever diagram we like. We can also train embeddings by just training on a few general NLP tasks. As long as we can calculate a gradient, we can update the system!
data:image/s3,"s3://crabby-images/45c9d/45c9dad0f8114755c293c16a7e2ccc7efedf4ed0" alt=""
In this diagram, we assume a sentence as input and that we have three classification labels that we'd like to predict. Also in this system, you'd have a layer in the middle that you can re-use as an embedding.