What do remote sensing, machine learning, and statistics have in common? Enhancing the accuracy of seagrass monitoring, for one

What do remote sensing, machine learning, and statistics have in common? Enhancing the accuracy of seagrass monitoring, for one

by Krti Tallam

Citation: Ha NT, Manley-Harris M, Pham TD, Hawes I. A Comparative Assessment of Ensemble-Based Machine Learning and Maximum Likelihood Methods for Mapping Seagrass Using Sentinel-2 Imagery in Tauranga Harbor, New Zealand. Remote Sensing [Internet]. MDPI AG; 2020 Jan 21;12(3):355. Available from: http://dx.doi.org/10.3390/rs12030355

      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Seagrass are a highly productive blue carbon ecosystem that comes in the form of meadows of grass in shallow, salty and brackish waters in many parts of the world, from the tropics to the Arctic Circle. Seagrasses belong to a group of plants known as monocotyledons, which include lilies, grasses, and palms; like their relatives, seagrasses have roots, leaves, and veins, and produce seeds and flowers. Now if you weren’t already amazed, here is the even more amazing part: they evolved around 100 million years ago and today, we know of approximately 72 different seagrass species belonging to four major groups. They’ve been on our planet for 100 million years and we only know of 72 species! Thus, seagrass ecosystems are clearly understudied. Seagrass can form dense underwater meadows, many of which are large enough to be seen from space. Despite being one of the most productive ecosystems in the world, seagrasses receive minimal attention.

Seagrasses have been acknowledged as one of the most productive blue carbon ecosystems, and are in significant decline across most of the globe. One of the first steps towards conservation is to map and monitor extant seagrass meadows, and although several methods are currently being used, mapping from satellite imagery via machine learning is still an uncommon approach, despite successful use in other comparable applications.

Therefore, Dr. Ha and team aimed to develop a novel approach for seagrass monitoring, using state-of-the-art machine learning, along with data from Sentinel-2 imagery. Sentinel-2 is a constellation of two identical satellites in the same orbit, imaging land and coastal areas at high spatial resolutions. They used Tauranga Harbor in New Zealand as a validation site, where they already had extensive ground-truthing data to compare their methods against, and developed ensemble machine learning methods including random forest (RF), rotation forests (RoF), and canonical correlation forests (CCF) with the more traditional maximum likelihood classifier (MLC) technique. Using a group of validation metrics, their results indicated that their machine learning techniques outperformed the MLC, and RoF was the best performer. Now, you have probably lost me at this point: so, let’s break these terms apart to understand what Dr. Ha really did.

Canonical Corre-what?

In recent years, machine learning (ML) has emerged as a novel approach for seagrass mapping and monitoring. Machine learning has the benefits of rapid learning, accommodation of non-linearity, and the availability of an increasing number of new, open source algorithms. In the field of seagrass mapping and monitoring, however, the application of machine learning is still in its infancy. Examples used to date include weighted majority voting including logistic model trees (LMT), AdaBoost, random forest (RF), and artificial neural networks (ANN) using digital images; and decision trees (DTs) using aerial photographs. In these examples, when used with high spatial resolution images (<1 m), machine learning models achieved an accuracy of 92-100 percent. Decision tree models using aerial photographs, however, achieved a lower accuracy of 66 percent for seagrass meadows when the plant cover was below 60 percent. In other words, all of the statistical and machine learning applications used until now have included mixed results, but they support the exploration of new machine learning approaches, particularly for improving low coverage seagrass mapping.

Among the various machine learning algorithms, rotation forest (RoF) and canonical correlation forest (CCF) algorithms are now emerging as reliable techniques for land cover mapping, landslide mapping using multi-spectral or hyper-spectral imagery, and rapid building mapping using multi-source data. These machine learning algorithms are well- known for helping with better detection of multi-class boundaries. These techniques potentially offer benefits in the classification of low coverage through enhanced recognition of edge boundaries. Therefore, their goal in this study was to compare the use of three ML algorithms, RF, RoF, and CCF, to the more traditional machine learning approaches for mapping the aboveground distribution of seagrass communities at low and high coverage using Sentinel-2 data.

The team’s target was Tauranga Harbor, New Zealand, for which ground truth data were available, and which offers a mosaic of dense, sparse, and zero seagrass coverage. They discussed the difference in the performance of the selected models for seagrass detection at two densities in this paper. Their hope was that their results would contribute alternative solutions for the mapping and monitoring of seagrass at various regions in the world, and assist in the conservation of this important blue carbon ecosystem.

So what does it all mean?

Well, first, in this study the machine learning methods outperformed the statistical methods for al evaluation metrics. In particular, the precision values of the ML models were higher, and greater than those obtained by the statistical methods for dense and sparse seagrass, whilst very high recall was observed for both classes using the MLC (maximum likelihood classifier) model.

Until now, no literature had looked at the comparative performance of the RoF, RF, CCF, and MLC classifier methods for seagrass mapping, along with a full radiometric correction (calibrating the image pixel values and correcting for error) of the image. Additionally, of the machine learning ensemble approaches that Dr. Ha’s team used, the RoF model demonstrated superior performance than that of CCF and RF; this is a unique result because other studies demonstrate superior performance of CCF. Of the methods tested here, only the RF technique has previously been applied to seagrass mapping using very high spatial resolution imagery. In that case, high precision (0.947) and recall (0.968) values were determined mapping Posidonia oceanica from digital airborne images, though no comparison to other methods was attempted. In another seagrass study, the overall accuracy only reached 82% using the RF algorithm applied to RapidEye imagery. Considering the size of the seagrass meadows and the mix of substrate in Tauranga Harbor, the measured scores in Dr. Ha’s team’s results were reliable for both dense and sparse seagrass mapping using medium spatial resolution of Sentinel-2 data (10 m pixel size), which is a really interesting result!

Dr. Ha’s results attest to the reliable application of the RoF model for the mapping and monitoring of seagrass in shallow water using Sentinel-2 imagery. Despite a lower accuracy for sparse than dense seagrass meadow classification, the CCF model shows potential for the mapping of seagrass and merits further testing at various scales and in various case studies. Regarding MLC, this model is still an applicable candidate for dense seagrass meadows, however, it may not be applicable for the mapping of sparse to very sparse seagrass meadows.

With the development of computer vision and pattern recognition, deep learning approaches using a variety of algorithms such as convolutional neural networks (CNNs or recurrent neural networks (RNNs) for semantic segmented imagery applied sub-pixel techniques should be encouraged for future studies.

 

About the author: KRTI TALLAM

My research interests: I am broadly interested in the ecology of environmental diseases, as they link to climate and anthropogenic stressors. I delve into coastal and oceanic environmental diseases that have links to both humans and to terrestrial systems. Currently, I conduct analyses on the responses of dengue to climatic and anthropogenic stressors off the coast of the Bay of Bengal, in India, while also

working with Stanford University to understand the role of schistosomiasis in environmental reservoirs. At Stanford, I serve as one of the few trans-disciplinary experts for planetary health topics, via machine learning / deep learning,

artificial intelligence, field experiments, writing, and policy to understand more about the environmental world of eco-epidemiology. I am a budding scientist, an innovator, a first- generation student, a woman, a woman of color, and a proud daughter of immigrants.

Current degree: Doctoral student, Stanford University, Biosciences