Transformers
Generic Transformer Models
- Encoder models : these models posses bidirectional attention, these are often referred as auto encoding models. The training is performed a perturbed (by masking words) sentence and reconstructing the initial sentence. These are suitable for sentence classification, named entity recognition (ner), Q&A.
- Decoder models: These models are often referred as auto-regressive models. The training process involves predicting next word. It is trivial to see that these are used for text generation.
- Encoder-decoder models: These are some times called as sequence to sequence models, these are best suited for summarization, generative Q&A, translation.
- Bias and limitations
- The original data may contain best and worst of what’s available on the internet
- These inherent biases won’t disappear even with fine-tuning as the original model may contain the bias
- Training a new model requires huge computational capacity, which is not possible for all.
This post is licensed under CC BY 4.0 by the author.