Cognitive benchmarking in large models

Overview¶

In this session, participants will explore the fundamentals of Large Language Models (LLMs), Vision-Language Models (VLMs), and reasoning models. The LLM and VLM sections will each include:

A concise theoretical overview
Key practical considerations
Hands-on exercises

Instructors¶

Lucas Gomez

Zihan Weng

Declan Campbell

Lucas Gomez is a PhD student in the Integrated Program in Neuroscience at McGill University. His research in the BashivanLab focuses on building large scale deep learning models of visual working memory that predict prefrontal cortex activity. He has a Bachelors of Computer Science from Carleton University in Ottawa, and he is broadly interested in the intersection between AI and the brain sciences.

Zihan Weng is a Ph.D. student in the Integrated Program in Neuroscience at McGill University, supervised by Pouya Bashivan, where he also completed his master’s degree. His research focuses on drawing inspiration from the brain’s hierarchical memory systems to develop more efficient and scalable deep learning architectures. He earned a master’s degree in Biomedical Engineering from UESTC and a bachelor’s degree in Computer Science. He is interested in how biological neural networks can help us build better artificial neural networks.

Declan Campbell is a PhD student in the Princeton Neuroscience Institute and a Natural and Artificial Minds (NAM) graduate fellow, advised by Jonathan Cohen and Tom Griffiths. His research draws on behavioral paradigms from cognitive psychology and methods from mechanistic interpretability to study the mechanisms underlying abstract reasoning in humans and machines.

Objectives¶

This session is divided into three parts:

Large Language Models (LLMs)¶

Learning the basics of LLMs:
- Architecture
- Tokenization
- Training and finetuning strategies
- In-context learning
Practical considerations:
- APIs and inference interfaces
- Controlling LLM outputs
- Prompt structure
- Differences between LLM variants
- Parsing and evaluating responses
Hands-on - Behavioral evaluation of an LLM:
- Huggingface
- OpenRouter
- Google colab

Vision-Language Models (VLMs)¶

Learning the basics of VLMs:
- Image encoders
- Image projection layers connecting encoders to LLM backbones
- Image tokenization
- Training regimes
Practical considerations:
- Computational differences ot LLMs
- Image prompting formats
Hands-on - Cognitive evaluation of a VLM:
- Huggingface
- OpenRouter
- Google colab

Reasoning Models¶

Learning the basics of reasoning-enhanced models:
- Chain-of-thought (CoT) prompting
- R1-style reasoning training and supervised reasoning traces
Practical considerations:
- Ensuring ad-hoc reasoning
- Understanding reasoning visibility differences
- Controlling reasoning effort and computation

Materials¶

Colab notebooks for cognitive benchmarks of language models

Colab notebooks for cognitive benchmarks of visuolanguage models

Presentation slides