קולוקוויום בחוג לגאופיזיקה: A Recipe for Improving Remote Sensing VLM Zero Shot Generalization
Vered Silverman, PhD
Zoom: https://tau-ac-il.zoom.us/j/86769967727?pwd=25OfJE7Na6lWggNsBNRbl7H8bnW56x.1
Abstract:
Foundation models have had a significant impact across various AI applications, enabling use cases that were previously impossible. Contrastive Visual Language Models (VLMs), in particular, have outperformed other techniques in many tasks. However, their prevalence in remote sensing (RS) is still limited, due to the scarcity of diverse remote-sensing visual-language datasets. We present novel remote-sensing foundation VLM and open-language detection models, trained on unique large-scale novel datasets (>20M unique examples), with complex and diverse textual descriptions, composed using Gemini combined with external sources of geographic information from Google Maps landmarks, and alt-text of public web images. Our pretrained models provide novel capabilities for mapping and retrieving complex image-text queries, specialized in the remote sensing domain, and show state-of-the art performance on a variety of down-stream tasks and known benchmarks. The unique strength of this model to perform complex image retrieval at scale for unique remote sensing tasks, while significantly reducing the compute costs and time, can be used to solve various real world tasks. The contrastive model is used to encode hundreds of million overhead images, providing global and temporal coverage, and afterwards implementing an efficient retrieval algorithm. This allows us to conduct geo-referenced queries around the world and compose different ad-hoc datasets for various tasks. We present a specific use case for studying natural disasters: a dataset of flood images across the globe, composed with a single image retrieval query, demonstrating the potential of leveraging our model capabilities to achieve complex reasoning for geo-spatial tasks and to geo-locate objects on the entire earth.
מארגן האירוע: ד"ר ליאור רובננקו

