AstroLLaVA: towards the unification of astronomical data and natural language

Apr 11, 2025·

Sharaf Zaman

Michael J. Smith

Pranav Khetarpal

Rishabh Chakrabarty

Michele Ginolfi

Marc Huertas-Company

Maja Jabłońska

Sandor Kruk

Matthieu Le Lain

Sergio José Rodríguez Méndez

Dimitrios Tanoglidis

· 0 min read

arXiv PDF Code

Abstract

We present AstroLLaVA, a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue. By fine-tuning the LLaVA model on a diverse dataset of ~30k images with captions and question-answer pairs sourced from NASA’s Astronomy Picture of the Day, the European Southern Observatory, and the NASA/ESA Hubble Space Telescope, we create a model capable of answering open-ended questions about astronomical concepts depicted visually. Our two-stage fine-tuning process adapts the model to both image captioning and visual question answering in the astronomy domain.

Type

Conference paper

Publication

SCI-FM@ICLR 2025, arXiv:2504.08583

Last updated on Mar 12, 2026

Foundation Models Vision Language Models Astronomy UniverseTBD

Authors

Matthieu Le Lain

AI for astronomy & astrophysics

PhD student at IRISA, Université Bretagne Sud (expected 2026), and lecturer in computer science, working on foundation models for astronomy and astrophysics, with a broader interest in deep learning applied to scientific data. Also contributing to the UniverseTBD collaboration on vision–language models for astronomy.

← How to finetune DINOv2 for astronmy? Aug 30, 2025

No results found

AstroLLaVA: towards the unification of astronomical data and natural language