Post

Transformers

Generic Transformer Models

  • Encoder models : these models posses bidirectional attention, these are often referred as auto encoding models. The training is performed a perturbed (by masking words) sentence and reconstructing the initial sentence. These are suitable for sentence classification, named entity recognition (ner), Q&A.
  • Decoder models: These models are often referred as auto-regressive models. The training process involves predicting next word. It is trivial to see that these are used for text generation.
  • Encoder-decoder models: These are some times called as sequence to sequence models, these are best suited for summarization, generative Q&A, translation.
  • Bias and limitations
    • The original data may contain best and worst of what’s available on the internet
    • These inherent biases won’t disappear even with fine-tuning as the original model may contain the bias
    • Training a new model requires huge computational capacity, which is not possible for all.

    source1 - source2

This post is licensed under CC BY 4.0 by the author.