Scientist wearing safety uniform and glove under working water analysis and water quality by get waste water to check case in laboratory is environment pollution problem

By Roman Yusifov, Axel André Schmidt, Michael Palocz-Andresen

Pollution of natural water bodies poses a growing threat to ecosystems, wildlife, and human health. Among the most critical contaminants are microplastics, heavy metals, and organic chemicals. Traditional detection methods are effective but often slow and resource-intensive. Today, artificial intelligence (AI) in combination with sensor technologies allows real-time observation of aquatic pollutants. This article explores various machine learning techniques, sensor applications, and a specific case study focused on microplastic detection through an AI-enhanced camera system. It highlights the opportunities and limitations of these systems and outlines a path for implementation in scientific and regulatory contexts. 

Figure 1: Pollutants (microplastics & mercury) in water 

Pollutants (microplastics & mercury) in water

Introduction

Access to clean water is fundamental to life, yet contamination of rivers, lakes, and oceans continues to increase. Sources of pollution include industrial discharge, agricultural runoff, and urban wastewater. Contaminants range from large debris to invisible toxins, many of which are not captured by conventional monitoring techniques. The need for real-time, scalable, and automated detection methods is becoming increasingly urgent. Advances in AI and the Internet of Things (IoT) offer new tools to address this need by enabling continuous, in situ monitoring of water quality in ways previously not possible [1].

Types of Pollutants in Focus

The term “pollutants” encompasses a range of harmful substances in water. Microplastics are small particles (typically less than 5 mm) derived from the degradation of plastic waste. They are ubiquitous and can be transported long distances by currents. Heavy metals like lead, mercury, and cadmium are persistent in ecosystems and toxic to both humans and wildlife even at low concentrations. Organic pollutants such as pesticides, herbicides, and hydrocarbons enter water through agricultural or industrial pathways. Each pollutant type requires different detection strategies. Their combined effects make comprehensive monitoring systems necessary, since a water body often contains a complex mixture of contaminants.

Figure 2: The path of primary and secondary microplastics into the sea

This graphic shows how microplastics enter the marine environment in various ways.  Primary microplastics are created directly, for example through textile abrasion during washing or from cosmetic products. Secondary microplastics are formed through the degradation of larger plastic waste that has been disposed of improperly. Environmental factors such as wind and UV radiation break down this waste, creating microplastics that are ingested by marine animals and enter the food chain.

Traditional Detection and its Limitations

The limitations of such lab-based detection methods have motivated the development of smart, automated systems capable of continuous field monitoring

Water analysis has traditionally involved collecting samples and analyzing them in laboratories using techniques like spectroscopy, chromatography, or microscopy. These methods are accurate but slow and labor-intensive. They provide data only after delays, and thus do not allow immediate response to sudden pollution events. Laboratory analyses are also expensive and require skilled personnel, which limits the frequency and coverage of monitoring. The limitations of such lab-based detection methods have motivated the development of smart, automated systems capable of continuous field monitoring [2]. By augmenting or replacing manual sampling with sensor networks and automated analysis, one can increase the temporal and spatial resolution of pollution tracking.

Artificial Intelligence as a Detection Engine

AI excels at identifying patterns in complex data, making it a powerful engine for detecting pollutants. In supervised learning approaches, algorithms learn from labeled training data to recognize specific pollutants.

Figure 3: Supervised learning process

Supervised learning process

For example, tree-based models like Random Forest and gradient boosting have been used to classify water quality or pollutant levels; one study achieved over 94% accuracy in water quality classification using CatBoost [2]. Deep learning models such as convolutional neural networks (CNNs) are applied for image-based pollutant detection and spatial analysis. CNNs can automatically learn visual features of contaminants – for instance, distinguishing microplastic particles from organic debris under a microscope or in sensor images. In some cases, recurrent neural networks like LSTM (Long Short-Term Memory) networks are employed to analyze temporal sequences of sensor readings, enabling the prediction of trends or anomalies in water quality over time. Unsupervised learning methods, including cluster analysis and principal component analysis (PCA), help detect unknown or unexpected anomaly patterns without prior labels. These can be useful for discovering emergent pollution events or novel contaminants. In practice, AI models are trained on diverse features – visual cues, chemical sensor outputs, time-series signals – that together constitute the “signature” of different pollutants. The model architecture or combination of models can be tailored to the monitoring task: for example, an integrated system might use a CNN to identify objects in images (like floating plastic fragments) and a separate algorithm to classify chemical sensor data. With sufficient training, such AI systems can rapidly recognize target pollutants and flag deviations in water quality data that would be hard to discern with manual observation.

Figure 4: Unsupervised learning process

Unsupervised learning process

This graphic shows the process of unsupervised learning in machine learning. Unlike supervised learning, the input data is not labeled in advance. The model simply receives a large amount of unlabeled data and then independently recognizes patterns and similarities. It groups the elements into so-called clusters based on these characteristics and separates them accordingly. This method is suitable for The model simply receives a large amount of unlabeled data and then independently recognizes patterns and similarities. It groups the elements based on these characteristics into so-called clusters and separates them accordingly. This method is particularly suitable for structuring unknown data and discovering hidden correlations without requiring prior knowledge.

IoT and Sensor Integration

Bringing AI into the field is achieved through IoT platforms that combine hardware and software for automated measurement. Devices such as pH meters, electrical conductivity probes, turbidity sensors, and camera units can record environmental parameters in real time and transmit data continuously. These sensors are often deployed in networks – for example, an array of sensors along a river or a set of instruments on buoys in a lake – to provide broad coverage. Collected data can be sent to edge computers (local processing units) or to cloud servers for analysis. In many designs, initial data processing happens at the edge: a local microcontroller or mini-computer filters and aggregates sensor readings, or even runs AI algorithms on-site for quick detection. This reduces the volume of data that must be transmitted and enables faster responses (a concept known as edge AI). Sensors can also be integrated into existing infrastructure, such as being installed in water treatment facilities or stormwater outlets, to continuously check for pollutant levels. For instance, optical turbidity sensors and flow meters in a smart storm drain could signal when runoff pollution spikes during heavy rain. All these devices collectively form an IoT-based monitoring network that feeds into machine learning models. The result is a dynamic system that not only measures parameters like temperature, pH, dissolved oxygen, or contaminant concentrations, but also interprets them. AI algorithms can fuse multi-source data – combining, say, chemical sensor data with image data from cameras – to improve the reliability of pollutant detection [3]. This integration of IoT with AI creates a feedback loop: sensors provide raw data, AI extracts meaningful information (like identifying an oil spill or a bloom of algae), and alerts or decisions can then be generated in real time for stakeholders.

Figure 5: IoT & big data for monitoring pollutants in water bodies

IoT & big data for monitoring pollutants in water bodies

This graphic illustrates the interaction of IoT sensors, big data technologies, and artificial intelligence in monitoring pollutants in water bodies. The IoT sensors measure various environmental parameters such as pH value, turbidity, or heavy metal concentrations directly in water sources. The collected data is forwarded to a central big data platform, where it is collected, processed, and structured. It is then visualized so that anomalies, trends, or warnings can be detected at an early stage. This allows changes in water quality to be identified in real time and targeted measures to be initiated quickly. Overall, this creates an innovative and effective basis for modern environmental monitoring.

Case study: AI based Microplastic Detection

A recent example of this approach involves an AI camera system designed to detect microplastics in flowing water. The system uses a high-speed camera with LED lighting to illuminate the stream. It is equipped with the YOLOv5 algorithm for object detection and DeepSORT for motion tracking. Field trials in the Raquette River and laboratory simulations showed successful identification of plastic particles. This illustrates how real-time, non-invasive tracking is feasible with modern AI tools.

Figure 6: Cameras for real-time monitoring

Cameras for real-time monitoring

Two specially trained AI models were used to evaluate the recorded image data. The first model, called “YOLOv5,” is a method for object recognition. It can recognize and mark specific objects, such as microplastic particles, directly in individual video images. Once a particle has been recognized, a second method called “DeepSORT” takes over the task of tracking this particle across several images. This makes it possible to track how the particle moves in the water. By combining both methods, the system can automatically detect how large the particles are, how fast they move and what paths they take in the water. All of this works directly on site, without the need for laboratory analyses [4].

The system’s energy consumption was also not to be underestimated, especially when it came to the continuous evaluation of video data.

In laboratory tests, the system achieved a detection rate of approximately 97%, while in the Raquette River, an accuracy of approximately 96% was achieved. The camera was particularly effective at a distance of 19-33 cm from the particle flow. Even different flow velocities and particle sizes (2-5 mm) could be reliably processed. The results show that AI-based systems can also work precisely and stably under natural conditions [4]. Despite the convincing results, there are also limitations. In particular, the lighting conditions and water turbidity had a significant influence on the detection performance. Accuracy decreased in low light conditions, which is why the researchers recommend improved light sources or more sensitive cameras. Another problem was the database. The model was initially trained with standardized laboratory samples. It was found that natural confounding factors such as algae, suspended solids, or organic material can lead to misclassifications. To counteract this, additional training data from real environmental conditions is required. The system’s energy consumption was also not to be underestimated, especially when it came to the continuous evaluation of video data. For long-term and large-scale use, the researchers therefore recommend the use of more energy-efficient hardware or so-called edge AI systems, which operate directly on site and require less computing power [4].

Figure 7: Controlled laboratory experiment

Controlled laboratory experiment

The graphic above illustrates the technical setup of the experiment in the laboratory channel. A camera unit was installed underwater in a 12-meter-long flow basin to record microplastic particles flowing by. The camera records the particles in real time and transmits the images directly to a computer. There, AI automatically analyzes the number, size, and movement of the particles. In addition, a flow sensor (ADV) was used to measure the speed of the water. This allowed all relevant information to be accurately recorded and directly evaluated.

Data Analysis and Interpretation

Once detected, data on pollutant location, movement, and concentration is aggregated. Trends are identified using time-series analysis. AI models help visualize the spread of contamination and allow researchers to anticipate critical load thresholds. These insights can support regulatory decisions and public health warnings [3].

Opportunities and Challenges

A key obstacle to the use of IoT technologies in water monitoring is the limited computing power and energy efficiency of the sensors used. In large and remote areas in particular, the lack of power sources and unstable data connections make continuous operation difficult. Although progress has been made in the development of energy-efficient chips and alternative power sources such as solar energy, their practical implementation is often cost-intensive and technically complex. In addition, short battery life and weather-dependent energy sources such as photovoltaics impair the long-term reliability of the systems [5].

A key problem is the processing of large amounts of data originating from different sensors. Since this data is often available in different formats and there are no uniform standards, fast and efficient real-time analysis and smooth integration into existing systems are difficult. Clear international standards for data formats and system architectures are necessary to ensure that different systems can work together reliably. Without such standards, many applications remain technically separate, which Without such specifications, many applications remain technically separate from one another, which limits their full performance [6].

Machine learning models, especially convolutional neural networks (CNN), often have trouble being applied to new scenarios. Small or unbalanced training datasets lead to what’s called overfitting, where the models are too focused on familiar patterns. Regularization methods and data augmentation can provide some relief here. At the same time, there is a need for larger, more diverse datasets and hybrid modeling approaches that combine AI models with physical knowledge to achieve greater robustness [6].

Extreme weather conditions and aging equipment also affect the reliability of measurement results. Sensors require regular maintenance and calibration to ensure their accuracy. This is an effort that is associated with high operating costs and logistical challenges [7].

In order to reliably analyze water quality values such as pH, turbidity, or certain pollutant concentrations, algorithms must be specifically tailored to the respective parameters. Methods such as K-Nearest Neighbor (KNN) or Support Vector Machines (SVM) require precise adaptation to the type of data and the respective area of application, which increases the development effort. This clearly shows how important it is to carefully test and

validate the models used for each specific application [8].

Data transmission continues to be a critical bottleneck in many monitoring systems. In low-power wide-area networks (LPWAN) in particular, delays often occur that impair real-time analysis. In remote regions, limited bandwidth and signal loss exacerbate the problem, which can lead to data gaps and delayed responses to environmental changes [5]. Approaches such as edge computing or combined network solutions that connect LPWAN with mobile communications or Wi-Fi offer promising alternatives, but often involve significant investment and operating costs [9].

The effectiveness of monitoring systems depends heavily on the accuracy of the sensor data. Factors such as temperature, corrosion, or sensor aging can distort the data. Regular calibration, the use of redundant sensors, and self-diagnostic algorithms offer solutions here. The combination of multiple sensors and the integration of automated testing mechanisms enable continuous quality control. However, this comes at the cost of greater technical complexity and resources [7].

Protective measures such as end-to-end encryption, firewalls, or intrusion detection systems are technically possible, but require additional effort

The transmission of sensitive environmental and location data makes IoT-based systems vulnerable to cyberattacks. Lack of encryption, weak authentication, and the lack of resources in structurally weak regions exacerbate these risks. Protective measures such as end-to-end encryption, firewalls, or intrusion detection systems are technically possible, but require additional effort. Blockchain-based approaches also show potential, but are currently difficult to scale [10; 1]. The challenges of data protection and security in the context of IoT-based systems

Conclusion and Future Perspective

An important aspect for the future is the further development of sensors. Many of the sensors used today are unable to reliably detect very fine particles such as microplastics, especially in particularly difficult conditions such as high turbidity or changing currents. In the future, more precise sensors or new material solutions that are more sensitive to microplastics, heavy metals, or organic pollutants could be used. For example, work could be done on smaller, more flexible sensor units that can be more easily integrated into existing systems. These could be installed on buoys, in sewage treatment plants, or in drainage channels, for example, where they could continuously provide measurement data. In combination with energy-efficient electronics and local data processing using edge AI, it would be possible to operate these systems reliably over longer periods of time. This would reduce maintenance and replacement costs. Edge AI is a method of artificial intelligence that works directly where the data is collected, i.e., on the respective device. Instead of sending the information to an external data center for evaluation, it is processed directly on site. This not only saves energy and labor, but also enables faster evaluation of the data and makes the systems more independent and efficient overall.

In addition to stationary systems, mobile platforms such as autonomous measurement boats or drones could also have great potential. Mobile platforms could be used specifically in areas that are difficult to access or particularly at risk, such as in the event of acute pollutant inputs or for monitoring after heavy rainfall events. By combining cameras, sensors, and AI, such units could independently collect and process data and, in the best case, even trigger warnings. Another area with great potential is the use of satellite data to monitor water bodies. Satellites can reveal large-scale changes on the water surface, such as oil spills or noticeable discoloration caused by sediments. This data could be evaluated with the help of artificial intelligence. This would allow patterns and changes to be identified much more quickly than compared to traditional methods. This idea could be particularly effective if the satellite data is combined with ground-based sensors.

About the Authors

Roman YusifovRoman Yusifov has been studying Business Informatics and Social Media & Information Systems at Leuphana University in Lüneburg since 202. He focuses on innovative technologies and their impact on human interaction, aiming to deepen his expertise in digital transformation and social connectivity.

Axel André SchmidtAxel André Schmidt graduated in Applied Physics from the University of Hamburg and, in 1994/95, developed an advanced online oil-spill-in-water monitor. He then joined DECKMA Hamburg GmbH, a leading manufacturer of oil-in-water measurement systems for marine and industrial use.

Michael Palocz-Andresen Michael Palocz-Andresen is a full professor at BUAP in Puebla, specializing in Sustainable Mobility since 2018 with support from the DAAD at TEC in Mexico. Until 2017, he held a professorship at the University of West Hungary. Currently, he serves as a guest professor at TU Budapest, Leuphana University Lüneburg, and Shanghai Jiao Tong University.

References 

[1] Hollender, J., van Bavel, B., Dulio, V., Farmen, E., Furtmann, K., Koschorreck, J., Kunkel, U., Krauss, M., Munthe, J., Schlabach, M., … Tornero, V. (2019). High resolution mass spectrometry-based non-target screening can support regulatory environmental monitoring and chemicals management. Environmental Sciences Europe, 31(1), 1–11. https://enveurope.springeropen.com/articles/10.1186/s12302-019-0225-x

[2] Nasira, N., Kansal, A., Alshatlony, O., Barnea, H., Sameer, M., Shanableh, A., & Al Shamma’a, A. (2022). Water quality classification using machine learning algorithms. Journal of Water Process Engineering, 48, 102919 (17 pages). https://www.sciencedirect.com/science/article/abs/pii/S2214714422003646?via%3Dihub

[3] Jiang, Y., Li, C., Sun, L., Guo, D., Zhang, Y., & Wang, W. (2021). A deep learning algorithm for multi-source data fusion to predict water quality of urban sewer networks. Journal of Cleaner Production, 318, 128533. https://www.sciencedirect.com/science/article/pii/S0959652621027426?via%3Dihub

[4] Sarker, M.A.B., Imtiaz, M.H., Holsen, T.M., & Baki, A.B.M. (2024). Real-time detection of microplastics using an AI camera. Sensors, 24(13), 4394. DOI: https://www.mdpi.com/1424-8220/24/13/4394

[5] Zulkifli, C. Z., Garfan, S., Talal, M., Alamoodi, A. H., Alamleh, A., Ahmaro, I. Y. Y., Sulaiman, S., Ibrahim, A. B., Zaidan, B. B., Ismail, A. R., Albahri, O. S., Albahri, A. S., Soon, C. F., Harun, N. H., & Chiang, H. H. (2022). IoT-Based Water Monitoring Systems: A Systematic Review. Water, 14 (22), 3621. https://www.mdpi.com/2073-4441/14/22/3621

[6] Rajitha, A., Aravinda, K., Nagpal, A., Kalra, R., Maan, P., Kumar, A., & Abdul-Zahra, D. S. (2024). Machine Learning and Al-Driven Water Quality Monitoring and Treatment. E3S Web of Conferences, 505, 03012. https://www.e3s- conferences.org/articles/e3sconf/abs/2024/35/e3sconf_icarae2023_03012/e3sconf_icarae2023_03012.html

[7] Martínez, R., Vela, N., el Aatik, A., Murray, E., Roche, P., & Navarro, J. M. (2020). On the use of an IoT integrated system for water quality monitoring and management in wastewater treatment plants. Water, 12 (4), 1096. https://www.mdpi.com/2073-4441/12/4/1096/pdf

[8] AlZubi, A. A. (2024). IoT-based automated water pollution treatment using machine learning classifiers. Environmental Technology, 45 (12), 2299-2307. https://www.tandfonline.com/doi/full/10.1080/09593330.2022.2034978

[9] Samuel, D. J., Sermet, Y., Cwiertny, D., & Demir, I. (2023). Integrating vision-based AI and language models for real-time water pollution surveillance. EarthArXiv. https://eartharxiv.org/repository/object/7057/download/13499/

[10] Kapelan, Z., Weisbord, E., & Babovic, V. (2020). Digital water: Artificial intelligence solutions for the water sector. International Water Association. https://iwa- network.org/wp-content/uploads/2020/08/IWA_2020_Artificial_Intelligence_SCREEN.pdf