Made byBobr AI

Geospatial Foundation Models for Agriculture & Environment

Explore how large-scale AI foundation models like NASA's Prithvi are transforming geospatial science, precision agriculture, and environmental monitoring.

#geospatial-ai#foundation-models#remote-sensing#precision-agriculture#environmental-science#machine-learning#satellite-imagery#nasa-prithvi
Watch
Pitch

Foundation Models for Geospatial Analysis

Agriculture & Environmental Studies

A lecture for university students on how large-scale AI models are transforming geospatial science

University of Sydney Shield Logo Precision Agriculture Laboratory Banner
Golden wheat field under blue sky
Made byBobr AI

We recognise and pay respect to the Elders and communities – past, present, and emerging – of the lands that the University of Sydney's campuses stand on. For thousands of years they have shared and exchanged knowledges across innumerable generations for the benefit of all.

University of Sydney Shield Logo
Made byBobr AI

Lecture Outline

Topics covered today

  • - What are Foundation Models?
  • - Geospatial Data & Remote Sensing
  • - Foundation Model Architectures
  • - Applications in Agriculture
  • - Applications in Environmental Studies
  • - Key Models & Benchmarks
  • - Case Studies & Real-world Deployments
  • - Challenges, Limitations & Future Directions
Made byBobr AI

What are Foundation Models?

Large-scale pre-trained AI models adaptable to many tasks

Trained on massive, diverse datasets using self-supervised learning

Can be fine-tuned for specific downstream tasks with minimal labelled data

Examples: GPT, BERT, ViT — now extended to geospatial/remote sensing data

Enable transfer learning across domains: images, time-series, multi-modal data

Reduce the need for large task-specific labelled datasets

Particularly powerful where labelled geospatial data is scarce or expensive

University of Sydney Shield Logo
Foundation
Model
Crop Mapping
Flood Detection
Land Use
Soil Analysis
Climate Monitoring
Made byBobr AI
University of Sydney

Geospatial Data & Remote Sensing

Types of data used in geospatial foundation models

  • Optical imagery: Sentinel-2, Landsat, Planet (multispectral, RGB)
  • Synthetic Aperture Radar (SAR): Sentinel-1 — all-weather, day/night
  • LiDAR: 3D point clouds for terrain, canopy structure
  • Hyperspectral: Hundreds of narrow spectral bands
  • Time-series stacks: multi-temporal observations for change detection
  • Auxiliary: weather, DEM, soil maps, crop calendars
Satellite imagery
Made byBobr AI

Foundation Model Architectures

Key architectures powering geospatial AI

  • Vision Transformer (ViT): Divides images into patches, applies self-attention — ideal for satellite imagery
  • Masked Autoencoder (MAE): Pre-trains by masking and reconstructing image patches
  • Contrastive Learning (MoCo, SimCLR): Learns features by contrasting similar/dissimilar image pairs
  • Multi-modal Transformers: Fuse optical, SAR, time-series, and text modalities
  • Temporal encoders: Handle multi-date image sequences for phenological monitoring
  • Prithvi uses 100M+ parameters pre-trained on terabytes of satellite data
University of Sydney Shield Logo
Scientific Diagram SatMAE
Made byBobr AI

Agriculture: Crop Monitoring & Yield Prediction

Foundation models applied to precision agriculture

  • Crop type mapping using multi-temporal Sentinel-2 time series
  • Yield prediction by fusing remote sensing with weather and soil data
  • NDVI and vegetation index time-series modelling for phenology
  • Economically Optimal N Rate (EONR) mapping using ML
  • In-season nitrogen management with remote sensing at Z31 growth stage
  • Transfer learning enables model use across different fields and seasons
University of Sydney Shield Logo
Crop Management Map & NDVI
Made byBobr AI

Agriculture: Soil Analysis & Experiment Design

Variable rate application and N-strip trial design

Tier Data Required Technology Application
Platinum Digital soil maps, yield maps, historical data Satellite + ML Full variable rate N application
Gold Pre-season soil nutrients + plant phenology Remote sensing + sensors Zone-based management
Silver PSA, CEC measurements Electromagnetic induction Basic zone mapping
Bronze Yield map only Standard GPS Simple prescription maps
  • N-strip trials: three rates — low, normal, high — applied at Z31 growth stage
  • Machine learning predicts EONR based on seasonal rainfall, EM, remote sensing
  • Model transfers across fields and seasons for scalable precision agriculture
University of Sydney Logo
Aerial agricultural field map
Made byBobr AI

Environmental Applications

Land use, deforestation, flood and climate monitoring

  • Land Use and Land Cover (LULC) classification at continental scale
  • Deforestation and forest degradation monitoring using SAR time-series
  • Flood extent mapping: rapid response using Sentinel-1 SAR
  • Urban heat island detection and impervious surface mapping
  • Wetland and mangrove habitat monitoring with multispectral + LiDAR
  • Wildfire burn severity assessment using pre/post spectral indices
  • Drought stress monitoring via soil moisture and vegetation anomalies
  • Carbon stock estimation for ecosystem accounting
University of Sydney Logo
False color satellite landscape
Made byBobr AI
University of Sydney

Key Foundation Models & Benchmarks

Overview of leading geospatial foundation models

Model Developer Architecture Pre-training Data Key Applications
Prithvi NASA / IBM ViT + MAE HLS Landsat-Sentinel Flood, crop, burn mapping
SatMAE Caltech MAE (temporal) fMoW + Sentinel-2 Scene classification, detection
Clay Clay Foundation ViT + contrastive Multi-sensor global General geospatial tasks
GFM Various Swin Transformer Multi-source RS LULC, segmentation
RemoteCLIP RSICD CLIP (vision-language) RS + captions Image-text retrieval
Made byBobr AI

Case Study: Prithvi Foundation Model

NASA and IBM — geospatial AI at scale

  • Developed by NASA and IBM, released as open-source in 2023
  • Pre-trained on 6 years of Harmonized Landsat-Sentinel (HLS) time-series data
  • 100M parameter Vision Transformer with temporal positional encoding
  • Fine-tuned tasks: flood mapping, wildfire burn scar detection, crop segmentation
  • Achieves state-of-the-art on flood mapping with only 10% of labelled training data
  • Demonstrates strong transfer learning across geographic regions
  • Available on Hugging Face — enables community fine-tuning for new applications

Key lesson: foundation models dramatically reduce labelling burden in geospatial AI

University of Sydney Logo
Prithvi-100M architecture diagram
Made byBobr AI

Challenges and Limitations

Open problems in geospatial foundation models

  • Data heterogeneity: sensors, resolutions, coordinate systems vary widely
  • Lack of large labelled benchmarks: annotation is costly and expert-intensive
  • Generalisation across geographies: models trained in one region may fail in another
  • Computational cost: training and fine-tuning require significant GPU resources
  • Interpretability: black-box nature complicates trust in high-stakes decisions
  • Temporal dynamics: handling multi-year phenological cycles and climate shifts
  • Multi-modal fusion: combining optical, SAR, LiDAR, and tabular data reliably
  • Ethical concerns: data ownership, indigenous land data sovereignty, and bias
University of Sydney Shield Logo
Rolf et al. 2024 Nature paper diagram showing Earth Foundation Model ecosystem
Made byBobr AI

Transfer Learning & Fine-tuning

Adapting foundation models to new geospatial tasks

  • Fine-tuning: updating all or selected model weights on a labelled downstream dataset
  • Linear probing: freeze backbone, train only the classification head — fast and data-efficient
  • LoRA (Low-Rank Adaptation): inject small trainable matrices — reduces GPU memory requirements
  • Prompt tuning: condition the model using learnable task tokens — no weight updates needed
  • Domain adaptation: bridge spectral gap between training (Landsat/HLS) and target (Sentinel-2, MODIS)
  • Few-shot learning: achieve competitive performance with as few as 10–50 labelled samples
  • Cross-region transfer: models pre-trained in US/Europe applied to Australia, Africa, Southeast Asia
University of Sydney Logo
Transfer Learning Diagram
Made byBobr AI

What an Image Is Mathematically

Pixels, channels, tensors, and from pixels to features

  • A digital image is a 3D tensor of shape H × W × C, where C = channels (e.g. 3 for RGB, 13 for Sentinel-2)
  • Each pixel holds intensity values in [0, 255] or normalised [0, 1] per channel
  • Colour images stacked as separate 2D matrices — red, green, blue each independent
  • Raw RGB pixels are not semantic: nearby pixels can belong to different classes
  • Local patterns (edges, textures) require filters — convolution extracts spatial features
  • Patches divide the image into fixed-size tiles (e.g. 16×16 px) — the input unit for transformers
University of Sydney Shield Logo
Diagram showing image mapped mathematically
Made byBobr AI

What an Embedding Is

A learned vector z ∈ ℝᵈ that captures semantics

  • An embedding is a dense numerical vector that represents the meaning of an input, not just its pixels
  • Formally: z = f(x) where f is a neural network encoder and z ∈ ℝᵈ (e.g. d = 64, 256, 768)
  • Semantically similar inputs map to nearby vectors in the embedding space
  • Patch embeddings: each 16×16 image patch is linearly projected into a d-dimensional token
  • Tokenisation: the image becomes a sequence of N patch tokens, analogous to words in NLP
  • Embeddings carry richer information than raw pixel values — texture, shape, context
University of Sydney Shield Logo
Scientific Diagram of Embedding
Made byBobr AI

Latent Space

Geometry, neighborhoods, clustering, interpolation, semantic directions

  • The latent space is the high-dimensional space where all embeddings live after encoding
  • Nearby points in latent space share semantic properties — e.g. 'forest' clusters together
  • Clustering: k-means or UMAP projections reveal meaningful groupings of land cover types
  • Interpolation: moving along a straight line between two embeddings smoothly transitions semantics
  • Semantic directions: PCA axes or learned directions encode concepts (e.g. 'more vegetation')
  • Well-structured latent spaces generalise better to unseen locations and sensor configurations
University of Sydney Shield Logo
t-SNE clustering projection map
Made byBobr AI

Similarity Math

Dot product, cosine similarity, Euclidean distance, nearest neighbours

  • Dot product: sim(a, b) = aᵀb — fast, used in attention mechanisms and retrieval
  • Cosine similarity: sim(a, b) = aᵀb / (‖a‖‖b‖) — measures angle, scale-invariant
  • Euclidean distance: d(a, b) = ‖a − b‖₂ — sensitive to magnitude, used in clustering
  • For unit-length embeddings (e.g. AlphaEarth): dot product equals cosine similarity
  • Nearest-neighbour search: find top-k most similar embeddings in large databases (FAISS)
  • Similarity enables zero-shot retrieval: find pixels/patches similar to a query without retraining
University of Sydney Logo
Vector Embeddings Diagram
Made byBobr AI

How Models Learn Embeddings

Supervised vs self-supervised objectives, contrastive learning, autoencoders

  • Supervised: learn embeddings using class labels — good performance, requires expensive annotations
  • Self-supervised: learn from data structure alone — no labels needed, scales to petabytes of satellite imagery
  • Contrastive learning: positive pairs (augmented views of same image) pulled together, negative pairs pushed apart
  • InfoNCE loss: maximise mutual information between positive pairs, foundational to MoCo and SimCLR
  • MoCo / SimCLR: seminal contrastive methods adapted for geospatial data (SeCo, GeoSSL)
  • Autoencoders: compress input to bottleneck z, reconstruct output — forces z to capture essentials
  • MAE (Masked Autoencoder): mask 75% of patches, reconstruct — learns rich patch-level representations
University of Sydney Shield Logo
MAE Architecture Diagram
Made byBobr AI

Transformers for Images

Self-attention, token mixing, and why ViTs changed embeddings

  • ViT (Vision Transformer): splits image into 16×16 patches, each projected to a token via linear embedding
  • Self-attention: each token attends to all others — captures global context unlike CNNs
  • Token mixing: information flows across the full image in every layer — long-range dependencies
  • Positional encodings: preserve spatial location of each patch token (learnt or Fourier-based)
  • ViT embeddings outperform CNN features on downstream remote sensing tasks with sufficient pre-training
  • Foundation models (Prithvi, SatMAE, Scale-MAE) all use ViT backbones pre-trained on satellite imagery
University of Sydney Shield Logo
ViT Architecture Diagram
Made byBobr AI

Remote Sensing Embeddings

Optical, SAR, temporal stacks, multi-modal fusion, pixel vs patch vs scene

  • Optical embeddings: derived from RGB or multispectral bands (Sentinel-2, Landsat) — sensitive to clouds
  • SAR embeddings: derived from radar backscatter — cloud-penetrating, captures structure and moisture
  • Temporal stack embeddings: multi-date image sequences encoded as a single vector capturing change over time
  • Multi-modal fusion: combine optical + SAR + LiDAR embeddings via concatenation or cross-attention
  • Pixel embeddings: one vector per pixel (e.g. AlphaEarth 64D at 10 m) — fine-grained spatial resolution
  • Patch embeddings: one vector per 16×16 tile — balance between detail and compute
  • Scene embeddings: one vector per full image chip — used for scene classification and retrieval
University of Sydney Shield Logo
Multi-modal remote sensing architecture diagram
Made byBobr AI

AlphaEarth Foundations — Case Study

A real geospatial embedding system at global scale

  • Annual 64-dimensional embeddings for every 10-metre pixel on Earth's land surface
  • Derived from multi-sensor yearly observations: Sentinel-2 optical + additional sensor stacks
  • Unit-length vectors: comparable across years via dot product or angle (cosine similarity = dot product)
  • Tasks supported: land cover classification, regression (e.g. yield, biomass), change detection, similarity search
  • Change detection: compare embedding angles between years — divergence = land use change
  • Similarity search: retrieve pixels globally that match a query location's embedding
  • Demonstrates the full pipeline: pixels → features → embeddings → latent space → downstream tasks
University of Sydney Shield Logo
Global Satellite Imagery Embedding Map
Made byBobr AI
Bobr AI

DESIGNER-MADE
PRESENTATION,
GENERATED FROM
YOUR PROMPT

Create your own professional slide deck with real images, data charts, and unique design in under a minute.

Generate For Free

Geospatial Foundation Models for Agriculture & Environment

Explore how large-scale AI foundation models like NASA's Prithvi are transforming geospatial science, precision agriculture, and environmental monitoring.

Foundation Models for Geospatial Analysis

Agriculture & Environmental Studies

A lecture for university students on how large-scale AI models are transforming geospatial science

We recognise and pay respect to the Elders and communities – past, present, and emerging – of the lands that the University of Sydney's campuses stand on. For thousands of years they have shared and exchanged knowledges across innumerable generations for the benefit of all.

Lecture Outline

Topics covered today

What are Foundation Models?

Geospatial Data & Remote Sensing

Foundation Model Architectures

Applications in Agriculture

Applications in Environmental Studies

Key Models & Benchmarks

Case Studies & Real-world Deployments

Challenges, Limitations & Future Directions

What are Foundation Models?

Large-scale pre-trained AI models adaptable to many tasks

Trained on massive, diverse datasets using self-supervised learning

Can be fine-tuned for specific downstream tasks with minimal labelled data

Examples: GPT, BERT, ViT — now extended to geospatial/remote sensing data

Enable transfer learning across domains: images, time-series, multi-modal data

Reduce the need for large task-specific labelled datasets

Particularly powerful where labelled geospatial data is scarce or expensive

Foundation

Model

Crop Mapping

Flood Detection

Land Use

Soil Analysis

Climate Monitoring

Geospatial Data & Remote Sensing

Types of data used in geospatial foundation models

Optical imagery: Sentinel-2, Landsat, Planet (multispectral, RGB)

Synthetic Aperture Radar (SAR): Sentinel-1 — all-weather, day/night

LiDAR: 3D point clouds for terrain, canopy structure

Hyperspectral: Hundreds of narrow spectral bands

Time-series stacks: multi-temporal observations for change detection

Auxiliary: weather, DEM, soil maps, crop calendars

Foundation Model Architectures

Key architectures powering geospatial AI

Agriculture: Crop Monitoring & Yield Prediction

Foundation models applied to precision agriculture

Crop type mapping using multi-temporal Sentinel-2 time series

Yield prediction by fusing remote sensing with weather and soil data

NDVI and vegetation index time-series modelling for phenology

Economically Optimal N Rate (EONR) mapping using ML

In-season nitrogen management with remote sensing at Z31 growth stage

Transfer learning enables model use across different fields and seasons

Agriculture: Soil Analysis & Experiment Design

Variable rate application and N-strip trial design

Platinum

Digital soil maps, yield maps, historical data

Satellite + ML

Full variable rate N application

Gold

Pre-season soil nutrients + plant phenology

Remote sensing + sensors

Zone-based management

Silver

PSA, CEC measurements

Electromagnetic induction

Basic zone mapping

Bronze

Yield map only

Standard GPS

Simple prescription maps

N-strip trials: three rates — low, normal, high — applied at Z31 growth stage

Machine learning predicts EONR based on seasonal rainfall, EM, remote sensing

Model transfers across fields and seasons for scalable precision agriculture

Environmental Applications

Land use, deforestation, flood and climate monitoring

Land Use and Land Cover (LULC) classification at continental scale

Deforestation and forest degradation monitoring using SAR time-series

Flood extent mapping: rapid response using Sentinel-1 SAR

Urban heat island detection and impervious surface mapping

Wetland and mangrove habitat monitoring with multispectral + LiDAR

Wildfire burn severity assessment using pre/post spectral indices

Drought stress monitoring via soil moisture and vegetation anomalies

Carbon stock estimation for ecosystem accounting

Key Foundation Models & Benchmarks

Overview of leading geospatial foundation models

Model

Developer

Architecture

Pre-training Data

Key Applications

Prithvi

NASA / IBM

ViT + MAE

HLS Landsat-Sentinel

Flood, crop, burn mapping

SatMAE

Caltech

MAE (temporal)

fMoW + Sentinel-2

Scene classification, detection

Clay

Clay Foundation

ViT + contrastive

Multi-sensor global

General geospatial tasks

GFM

Various

Swin Transformer

Multi-source RS

LULC, segmentation

RemoteCLIP

RSICD

CLIP (vision-language)

RS + captions

Image-text retrieval

Case Study: Prithvi Foundation Model

NASA and IBM — geospatial AI at scale

Developed by NASA and IBM, released as open-source in 2023

Pre-trained on 6 years of Harmonized Landsat-Sentinel (HLS) time-series data

100M parameter Vision Transformer with temporal positional encoding

Fine-tuned tasks: flood mapping, wildfire burn scar detection, crop segmentation

Achieves state-of-the-art on flood mapping with only 10% of labelled training data

Demonstrates strong transfer learning across geographic regions

Available on Hugging Face — enables community fine-tuning for new applications

Key lesson: foundation models dramatically reduce labelling burden in geospatial AI

Challenges and Limitations

Open problems in geospatial foundation models

Data heterogeneity:

sensors, resolutions, coordinate systems vary widely

Lack of large labelled benchmarks:

annotation is costly and expert-intensive

Generalisation across geographies:

models trained in one region may fail in another

Computational cost:

training and fine-tuning require significant GPU resources

Interpretability:

black-box nature complicates trust in high-stakes decisions

Temporal dynamics:

handling multi-year phenological cycles and climate shifts

Multi-modal fusion:

combining optical, SAR, LiDAR, and tabular data reliably

Ethical concerns:

data ownership, indigenous land data sovereignty, and bias

Transfer Learning & Fine-tuning

Adapting foundation models to new geospatial tasks

Fine-tuning: updating all or selected model weights on a labelled downstream dataset

Linear probing: freeze backbone, train only the classification head — fast and data-efficient

LoRA (Low-Rank Adaptation): inject small trainable matrices — reduces GPU memory requirements

Prompt tuning: condition the model using learnable task tokens — no weight updates needed

Domain adaptation: bridge spectral gap between training (Landsat/HLS) and target (Sentinel-2, MODIS)

Few-shot learning: achieve competitive performance with as few as 10–50 labelled samples

Cross-region transfer: models pre-trained in US/Europe applied to Australia, Africa, Southeast Asia

What an Image Is Mathematically

Pixels, channels, tensors, and from pixels to features

A digital image is a 3D tensor of shape H × W × C, where C = channels (e.g. 3 for RGB, 13 for Sentinel-2)

Each pixel holds intensity values in [0, 255] or normalised [0, 1] per channel

Colour images stacked as separate 2D matrices — red, green, blue each independent

Raw RGB pixels are not semantic: nearby pixels can belong to different classes

Local patterns (edges, textures) require filters — convolution extracts spatial features

Patches divide the image into fixed-size tiles (e.g. 16×16 px) — the input unit for transformers

What an Embedding Is

A learned vector z ∈ ℝᵈ that captures semantics

An embedding is a dense numerical vector that represents the meaning of an input, not just its pixels

Formally: z = f(x) where f is a neural network encoder and z ∈ ℝᵈ (e.g. d = 64, 256, 768)

Semantically similar inputs map to nearby vectors in the embedding space

Patch embeddings: each 16×16 image patch is linearly projected into a d-dimensional token

Tokenisation: the image becomes a sequence of N patch tokens, analogous to words in NLP

Embeddings carry richer information than raw pixel values — texture, shape, context

Latent Space

Geometry, neighborhoods, clustering, interpolation, semantic directions

The latent space is the high-dimensional space where all embeddings live after encoding

Nearby points in latent space share semantic properties — e.g. 'forest' clusters together

Clustering: k-means or UMAP projections reveal meaningful groupings of land cover types

Interpolation: moving along a straight line between two embeddings smoothly transitions semantics

Semantic directions: PCA axes or learned directions encode concepts (e.g. 'more vegetation')

Well-structured latent spaces generalise better to unseen locations and sensor configurations

Similarity Math

Dot product, cosine similarity, Euclidean distance, nearest neighbours

<strong style="font-weight: bold;">Dot product:</strong> sim(a, b) = aᵀb &mdash; fast, used in attention mechanisms and retrieval

<strong style="font-weight: bold;">Cosine similarity:</strong> sim(a, b) = aᵀb / (‖a‖‖b‖) &mdash; measures angle, scale-invariant

<strong style="font-weight: bold;">Euclidean distance:</strong> d(a, b) = ‖a − b‖₂ &mdash; sensitive to magnitude, used in clustering

<strong style="font-weight: bold;">For unit-length embeddings</strong> (e.g. AlphaEarth): dot product equals cosine similarity

<strong style="font-weight: bold;">Nearest-neighbour search:</strong> find top-k most similar embeddings in large databases (FAISS)

<strong style="font-weight: bold;">Similarity enables zero-shot retrieval:</strong> find pixels/patches similar to a query without retraining

How Models Learn Embeddings

Supervised vs self-supervised objectives, contrastive learning, autoencoders

<strong style="font-weight: bold;">Supervised:</strong> learn embeddings using class labels &mdash; good performance, requires expensive annotations

<strong style="font-weight: bold;">Self-supervised:</strong> learn from data structure alone &mdash; no labels needed, scales to petabytes of satellite imagery

<strong style="font-weight: bold;">Contrastive learning:</strong> positive pairs (augmented views of same image) pulled together, negative pairs pushed apart

<strong style="font-weight: bold;">InfoNCE loss:</strong> maximise mutual information between positive pairs, foundational to MoCo and SimCLR

<strong style="font-weight: bold;">MoCo / SimCLR:</strong> seminal contrastive methods adapted for geospatial data (SeCo, GeoSSL)

<strong style="font-weight: bold;">Autoencoders:</strong> compress input to bottleneck z, reconstruct output &mdash; forces z to capture essentials

<strong style="font-weight: bold;">MAE (Masked Autoencoder):</strong> mask 75% of patches, reconstruct &mdash; learns rich patch-level representations

Transformers for Images

Self-attention, token mixing, and why ViTs changed embeddings

<strong style="font-weight: bold;">ViT (Vision Transformer):</strong> splits image into 16&times;16 patches, each projected to a token via linear embedding

<strong style="font-weight: bold;">Self-attention:</strong> each token attends to all others &mdash; captures global context unlike CNNs

<strong style="font-weight: bold;">Token mixing:</strong> information flows across the full image in every layer &mdash; long-range dependencies

<strong style="font-weight: bold;">Positional encodings:</strong> preserve spatial location of each patch token (learnt or Fourier-based)

<strong style="font-weight: bold;">ViT embeddings</strong> outperform CNN features on downstream remote sensing tasks with sufficient pre-training

<strong style="font-weight: bold;">Foundation models</strong> (Prithvi, SatMAE, Scale-MAE) all use ViT backbones pre-trained on satellite imagery

Remote Sensing Embeddings

Optical, SAR, temporal stacks, multi-modal fusion, pixel vs patch vs scene

<strong style="font-weight: bold;">Optical embeddings:</strong> derived from RGB or multispectral bands (Sentinel-2, Landsat) &mdash; sensitive to clouds

<strong style="font-weight: bold;">SAR embeddings:</strong> derived from radar backscatter &mdash; cloud-penetrating, captures structure and moisture

<strong style="font-weight: bold;">Temporal stack embeddings:</strong> multi-date image sequences encoded as a single vector capturing change over time

<strong style="font-weight: bold;">Multi-modal fusion:</strong> combine optical + SAR + LiDAR embeddings via concatenation or cross-attention

<strong style="font-weight: bold;">Pixel embeddings:</strong> one vector per pixel (e.g. AlphaEarth 64D at 10 m) &mdash; fine-grained spatial resolution

<strong style="font-weight: bold;">Patch embeddings:</strong> one vector per 16&times;16 tile &mdash; balance between detail and compute

<strong style="font-weight: bold;">Scene embeddings:</strong> one vector per full image chip &mdash; used for scene classification and retrieval

AlphaEarth Foundations — Case Study

A real geospatial embedding system at global scale

Annual 64-dimensional embeddings for every 10-metre pixel on Earth's land surface

Derived from multi-sensor yearly observations: Sentinel-2 optical + additional sensor stacks

Unit-length vectors: comparable across years via dot product or angle (cosine similarity = dot product)

Tasks supported: land cover classification, regression (e.g. yield, biomass), change detection, similarity search

Change detection: compare embedding angles between years — divergence = land use change

Similarity search: retrieve pixels globally that match a query location's embedding

Demonstrates the full pipeline: pixels → features → embeddings → latent space → downstream tasks