AstroLLaVA: towards the unification of astronomical data and natural language

Apr 11, 2025·
Sharaf Zaman
,
Michael J. Smith
,
Pranav Khetarpal
,
Rishabh Chakrabarty
,
Michele Ginolfi
,
Marc Huertas-Company
,
Maja Jabłońska
,
Sandor Kruk
Matthieu Le Lain
Matthieu Le Lain
,
Sergio José Rodríguez Méndez
,
Dimitrios Tanoglidis
· 0 min read
Abstract
We present AstroLLaVA, a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue. By fine-tuning the LLaVA model on a diverse dataset of ~30k images with captions and question-answer pairs sourced from NASA’s Astronomy Picture of the Day, the European Southern Observatory, and the NASA/ESA Hubble Space Telescope, we create a model capable of answering open-ended questions about astronomical concepts depicted visually. Our two-stage fine-tuning process adapts the model to both image captioning and visual question answering in the astronomy domain.
Type
Publication
SCI-FM@ICLR 2025, arXiv:2504.08583
publications
Matthieu Le Lain
Authors
AI for astronomy & astrophysics
PhD student at IRISA, Université Bretagne Sud (expected 2026), and lecturer in computer science, working on foundation models for astronomy and astrophysics, with a broader interest in deep learning applied to scientific data. Also contributing to the UniverseTBD collaboration on vision–language models for astronomy.