What Are Large Language Models (LLMs)?
Large language models (LLMs) are language models that can achieve general-purpose language understanding and generation. That is, LLMs can come close to human-like understanding and generation of language. LLMs form the foundation of popular AI tools such as ChatGPT, Bard and Copilot and have found widespread uses across major industries deploying AI.
How Are Large Language Models Developed?
Large language models (LLMs) are designed as artificial neural networks (ANNs) using a transformer architecture. Transformer architectures are also used to develop generative pre-trained transformers (GPTs), the basis for generative AI (GenAI) tools.
The development cycle of an LLM follows the following steps:
- Data Collection And Preparation: Vast text data is gathered from various sources like books, articles, code, websites, social media and more. The data is then cleaned and preprocessed to remove errors, inconsistencies, and biases.
- Model Architecture: The ANN providing the computational muscle for the LLMs is developed using any of the popular architectures. For LLMs, a transformer-based architecture is typically used. These ANNs can have dozens of layers, depending on the volume of data and the size of the LLM in general. For instance, GPT-4 has an ANN with 96 layers.
- Training: After an architecture is decided, the LLM is fed the preprocessed data millions of times. During training, the LLM learns to recognise patterns and relationships in the data. This training can take days, weeks, or even months depending on the size and complexity of the model and the available computational resources. Its cost can run into millions of dollars.
- Fine-Tuning And Evaluation: Once trained, the LLM is further fine-tuned on specific tasks or domains, like translation or answering questions. While prompt engineering is used in place of brute-force fine-tuning, the process has a scope limited to a single conversation or context window. The model is then fine-tuned using different methods, including reinforcement learning from human feedback (RLHF) and instruction tuning.
Following the fine-tuning phase, the LLM can be deployed for real-world applications if it performs well. The LLMs are continuously improved and built upon, learning from interactions with users.
Are LLMs A Form Of AI?
Technically, LLMs are considered to be a type of AI. They are based on ANNs and deploy machine learning algorithms to analyse and process information, which are hallmarks of AI.
That argument is also true from a functional viewpoint. LLMs exhibit some characteristics of AI, such as learning, problem-solving, and generating creative text formats. However, their capabilities are currently limited to specific tasks related to language.
What Are The Use Cases Of Large Language Models?
LLMs form the core of GPT-based AI tools such as ChatGPT and thus, have all the applications that a generative transformer like GPT-3.5 or GPT-4 will find in the real world. The prominent application is content generation across multiple media forms such as text, code, audio, image and video (multimodality).
Though LLMs themselves are limited to text-based input and output, outputs in other forms are generated using different encoder-decoder pairs.
Which Indian Startups Are Developing Homegrown LLMs?
Several Indian startups are working on developing LLMs for specific use cases.
- Zoho: The SaaS unicorn recently announced plans to build its own LLM, targeting smaller models (7-20 Bn parameters) focused on solving specific domain problems for its customers.
- Krutrim: Founded by Bhavish Aggarwal (Ola Group), Krutrim launched India’s first multilingual LLM capable of generating text in 10 Indian languages. The startup is focussing on building models aligned with India’s specific linguistic and cultural needs.
- Sarvam AI: The Bengaluru-based startup has developed OpenHathi, an open-source LLM dedicated to Indian languages. Its initial release, OpenHathi-Hi-v0.1, targets Hindi and aims to improve the way computers understand and respond in Indic languages.
- Soket Labs: Focused on building LLMs for consumer-facing and internal processes, Soket Labs offers its technology stack to companies for training language models on its proprietary data. The startup plans to launch its LLM, Pragna, in the coming months.