9:00 AM - 9:30 AM |
|
REGISTRATION |
|
9:30 AM - 11:00 AM |
Prof. Preslav Nakov |
Towards Truly Open, Language-Specific, Safe, Factual, and Specialized Large Language Models |
View Abstract First, we will argue for the need for fully transparent open-source large language models (LLMs), and we will describe the efforts of MBZUAI's Institute on Foundation Models (IFM) towards that based on the LLM360 initiative. Second, we will argue for the need for language-specific LLMs, and we will share our experience from building Jais, the world's leading open Arabic-centric foundation and instruction-tuned large language model, Nanda, our open-weights Hindi LLM, Sherkala, our open-weights Kazakh LLM, and some other models. Third, we will argue for the need for safe LLMs, and we will present Do-Not-Answer, a dataset for evaluating the guardrails of LLMs, which is at the core of the safety mechanisms of our LLMs. Fourth, we will argue for the need for factual LLMs, we will discuss the factuality challenges that LLMs pose. We will then present some recent relevant tools for addressing these challenges developed at MBZUAI: (i) OpenFactCheck, a framework for fact-checking LLM output, for building customized fact-checking systems, and for benchmarking LLMs for factuality, (ii) LM-Polygraph, a tool for predicting an LLM's uncertainty in its output using cheap and fast uncertainty quantification techniques, and (iii) LLM-DetectAIve, a tool for machine-generated text detection. Finally, we will argue for the need for specialized models, and we will present the zoo of LLMs currently being developed at MBZUAI's IFM. |
11:00 PM - 11:15 PM |
|
TEA BREAK |
|
11:15 AM - 12:45 PM |
Prof. Tanmoy Chakraborty |
Don't underestimate the power of small language models |
View Abstract Despite the superior performance demonstrated by Transformer-based LLMs across numerous applications involving natural languages, their high computational cost, energy consumption, and limited accessibility underscore the need for efficient, interpretable, and adaptable small language models (SLMs). This talk highlights methods to develop economical and interpretable SLMs that rival their larger counterparts in performance without significant computational requirements. Our research emphasizes three key dimensions: economical resource usage, adaptability to diverse and low-resource tasks, and enhanced interpretability. Techniques like competitive knowledge distillation, leveraging student-teacher dynamics, and activation sparsity in manifold-preserving transformers demonstrate significant efficiency gains without compromising performance. We formulate novel decomposer components for LLMs for modularizing problem decomposition and solution generation, allowing smaller models to excel in complex reasoning tasks. We also propose innovative prompt construction and alignment strategies that boost in-context knowledge adaptation in low-resource settings for SLMs. Our findings demonstrate that SLMs can achieve scalability, interpretability, and adaptability, paving the way for broader and sustainable AI accessibility. |
12:45 PM - 1:45 PM |
|
LUNCH |
|
1:45 PM - 3:45 PM |
Sahil Mishra |
Retrieval-Augmented Language Models – Bridging LLMs with Efficient Knowledge Retrieval |
View Abstract Large Language Models (LLMs) are powerful but have limitations like forgetting recent information and hallucination. Retrieval-Augmented Language Models (RAG) solve these problems by allowing models to fetch relevant information from external sources instead of relying only on what they were trained on. This session will cover how retrieval-based models work, the different ways they retrieve information (like using sparse and dense retrieval methods), and how they improve accuracy and efficiency. We will explore models like kNN-LMs, REALM, RETRO, and RAG, showing how they use retrieval to enhance responses. Additionally, we will discuss strategies for improving retrieval, aligning retrieved knowledge with model outputs, and refining prompts for better results, especially in low-resource settings. By combining retrieval with language models, we can build smaller, more efficient, and more reliable AI systems that provide accurate, well-supported answers in real-world applications. |
3:45 PM - 4:00 PM |
|
TEA BREAK |
|
4:00 PM - 6:00 PM |
Ankush Chander |
LLM finetuning: Fundamentals and best practices |
View Abstract Large language models have transformed the field of NLP by performing well on tasks that were previously not reachable. Even though LLMs have great general language capabilities, sometimes it's not enough for the application specific tasks. Fine-tuning allows users to adapt pre-trained LLMs to more specialized tasks. By fine-tuning a model on a small dataset of task-specific data, you can improve its performance on that task while preserving its general language knowledge. In this session, we will discuss finetuning basics, memory optimization techniques like Quantization, LoRA and finetune some LLMs along the way. |