2024
1.
Graffeuille, Olivier; Lehmann, Moritz; Allan, Matthew; Wicker, Jörg; Koh, Yun Sing
Lake by Lake, Globally: Enhancing Water Quality Remote Sensing with Multi-Task Learning Models Unpublished Forthcoming
Forthcoming, ISSN: 1556-5068.
Abstract | Links | BibTeX | Altmetric | PlumX | Tags: inland and coastal waters, machine learning, multi-task learning, remote sensing, water quality
@unpublished{graffeuille2024lake,
title = {Lake by Lake, Globally: Enhancing Water Quality Remote Sensing with Multi-Task Learning Models},
author = {Olivier Graffeuille and Moritz Lehmann and Matthew Allan and J\"{o}rg Wicker and Yun Sing Koh },
doi = {10.2139/ssrn.4762429},
issn = {1556-5068},
year = {2024},
date = {2024-03-17},
urldate = {2024-03-17},
abstract = {The estimation of water quality from satellite remote sensing data in inland and coastal waters is an important yet challenging problem. Recent collaborative efforts have produced large global datasets with sufficient data to train machine learning models with high accuracy. In this work, we investigate global water quality remote sensing models at the granularity of individual water bodies. We introduce Multi-Task Learning (MTL), a machine learning technique that learns a distinct model for each water body in the dataset from few data points by sharing knowledge between models. This approach allows MTL to learn water body differences, leading to more accurate predictions. We train and validate our model on the GLORIA dataset of in situ measured remote sensing reflectance and three water quality indicators: chlorophyll$a$, total suspended solids and coloured dissolved organic matter. MTL outperforms other machine learning models by 8-31% in Root Mean Squared Error (RMSE) and 12-34% in Mean Absolute Percentage Error (MAPE). Training on a smaller dataset of chlorophyll$a$ measurements from New Zealand lakes with simultaneous Sentinel-3 OLCI remote sensing reflectance further demonstrates the effectiveness of our model when applied regionally. Additionally, we investigate the performance of machine learning models at estimating the variation in water quality indicators within individual water bodies. Our results reveal that overall performance metrics overestimate the quality of model fit of models trained on a large number of water bodies due to the large between-water body variability of water quality indicators. In our experiments, when estimating TSS or CDOM, all models excluding multi-task learning fail to learn within-water body variability, and fail to outperform a naive baseline approach, suggesting that these models may be of limited usefulness to practitioners monitoring water quality. Overall, our research highlights the importance of considering water body differences in water quality remote sensing research for both model design and evaluation. },
keywords = {inland and coastal waters, machine learning, multi-task learning, remote sensing, water quality},
pubstate = {forthcoming},
tppubtype = {unpublished}
}
The estimation of water quality from satellite remote sensing data in inland and coastal waters is an important yet challenging problem. Recent collaborative efforts have produced large global datasets with sufficient data to train machine learning models with high accuracy. In this work, we investigate global water quality remote sensing models at the granularity of individual water bodies. We introduce Multi-Task Learning (MTL), a machine learning technique that learns a distinct model for each water body in the dataset from few data points by sharing knowledge between models. This approach allows MTL to learn water body differences, leading to more accurate predictions. We train and validate our model on the GLORIA dataset of in situ measured remote sensing reflectance and three water quality indicators: chlorophyll$a$, total suspended solids and coloured dissolved organic matter. MTL outperforms other machine learning models by 8-31% in Root Mean Squared Error (RMSE) and 12-34% in Mean Absolute Percentage Error (MAPE). Training on a smaller dataset of chlorophyll$a$ measurements from New Zealand lakes with simultaneous Sentinel-3 OLCI remote sensing reflectance further demonstrates the effectiveness of our model when applied regionally. Additionally, we investigate the performance of machine learning models at estimating the variation in water quality indicators within individual water bodies. Our results reveal that overall performance metrics overestimate the quality of model fit of models trained on a large number of water bodies due to the large between-water body variability of water quality indicators. In our experiments, when estimating TSS or CDOM, all models excluding multi-task learning fail to learn within-water body variability, and fail to outperform a naive baseline approach, suggesting that these models may be of limited usefulness to practitioners monitoring water quality. Overall, our research highlights the importance of considering water body differences in water quality remote sensing research for both model design and evaluation.