Loading...
Please wait, while we are loading the content...
Similar Documents
Machine Learning and Multiple Imputation Approach to Predict Chlorophyll-a Concentration in the Coastal Zone of Korea
Content Provider | MDPI |
---|---|
Author | Kim, Hae-Ran Soh, Ho Young Kwak, Myeong-Taek Han, Soon-Hee |
Copyright Year | 2022 |
Description | The concentration of chlorophyll-a (Chl-a) is an integrative bio-indicator of aquatic ecosystems and a direct indicator that evaluates the ecological status of water bodies. In this study, we focused on predicting the Chl-a concentration in seawater using machine learning (after replacing missing values). To replace the missing values among marine environment observation data, a comparison experiment was performed using multiple built-in imputation methods (i.e., pmm, cart, rf, norm, norm.nob, norm.boot, and norm.predict) of the mice package in R. The cart method was selected as the most suitable. We generated each regression model using six machine learning algorithms (regression tree, support vector regression (SVR), bagging, random forest, gradient boosting machine (GBM), and extreme gradient boosting (XGBoost)) to predict the Chl-a concentration based on the complete imputed dataset. The prediction performance of the models was evaluated by four evaluation criteria using 10-fold cross-validation tests. XGBoost, an ensemble learning approach, outperformed other models in predicting the Chl-a concentration; SVR, a single model, also showed a good performance. The most important environmental factor in predicting the Chl-a concentration was an organic carbon particulate; however, dissolved oxygen also showed potential. This study was conducted with field observations in the spring and summer in the coastal zone of Korea. There exists a limit in machine learning applications, which excludes temporal and spatial factors. However, extensions to time series forecasting for deep learning or machine learning can lead to meaningful regional and seasonal analysis. It can also improve prediction performance as a result of the long-term data accumulation of field observations of more varied features (such as meteorological and hydrodynamic) besides water quality. |
Starting Page | 1862 |
e-ISSN | 20734441 |
DOI | 10.3390/w14121862 |
Journal | Water |
Issue Number | 12 |
Volume Number | 14 |
Language | English |
Publisher | MDPI |
Publisher Date | 2022-06-10 |
Access Restriction | Open |
Subject Keyword | Water Remote Sensing Missing Values Multiple Imputation Multivariate Imputation By Chained Equation (mice) Machine Learning Chlorophyll-a Model Accuracy Metrics |
Content Type | Text |
Resource Type | Article |