Large Language Learning Models in Customs Part 2

6/18/2023 Matthias Pfeiler

In our first part, we talked in general about large language learning models. Today, we will take a brief trip down memory lane and clarify how the technology developed and where it came from. Language Models: A brief and evolving history.

Before the 2000s: Traditional language models

Traditional language models, especially so-called n-gram models, have been around for several decades. However, they encounter challenges such as the so-called "curse of dimensionality" and the problem of "sparsity", which affect their ability to produce coherent texts.

Mid-2000s: neural networks for language modelling

In 2007, Geoffrey Hinton brought a revolution to the development of powerful networks with his groundbreaking advances in neural network training. These advances improved language models by representing complex concepts and processing previously unknown sequences. Yet these models lacked coherence with respect to the input sequences.

Early 2010s: LSTM networks gain traction

Long Short-Term Memory (LSTM) networks, first introduced in 1995, gained popularity in the 2010s. Their ability to process sequences of arbitrary length and dynamically update their internal states made them particularly attractive. Despite their improvements, however, LSTMs encountered difficulties in handling long-term dependencies and sequential processing.

The late 2010s: Transformers revolutionise NLP

In 2017, Google's post "Attention Is All You Need" introduced Transformer networks that greatly improve natural language processing. Transformers are parallelisable, can be easily trained, and use attention mechanisms to emphasise certain parts of the input. However, they have fixed input and output sizes, with an increase in computational effort by a certain factor.

The 2020s: the advent of GPT models

Generative Pre-trained Transformers (GPT) emerged as a dominant force in language modelling, with OpenAI's GPT-3 showing state-of-the-art results without fine-tuning. In 2022, OpenAI introduced InstructGPT, which improves instruction following and reduces toxicity using Reinforcement Learning from Human Feedback (RLHF). OpenAI, Meta, Google and the open source research community have contributed several large language models, such as OPT, FLAN-T5, BERT, BLOOM and StableLM. The field is evolving rapidly, with the latest models and capabilities changing every few weeks.