Water quality identification based on remote sensing image in industrial waste disposal using convolutional neural networks

.


ISSN 2722-4139
Science in Information Technology Letters 29 Vol. 2., No. 2, November 2021, pp. 28-37 River water pollution measurements use two ways, namely the titration method (determining dissolved oxygen levels) and DO meter electronic measuring instruments. Measurements using DO meters are more practical but must come into contact with the object. While the titration method requires chemical analysis using the determination of concentration and reactants. The titration process has titrants as a known solution, and the titration is determined [2].
This study aims to analyze the quality of industrial waste disposal water by utilizing the sensor on the UAV RGB camera. The image obtained through remote sensing is a river water object that has changed color and temperature due to industrial waste pollution. Based on the process, we used the CNN method to identify the pollution of river water waste based on color changes in the image of river water.

Method
This study applies the CNN method for image identification on river water pollution problems. The object data uses a sample captured from an RGB camera via (UAV). Digital image data (object) is preprocessed and divided into training and test data. Each training and test data consist of polluted water and clean water data. The next step is the process of training and testing with the CNN method to measure water quality in conditions polluted by heavy metals and the suitability of water for consumption based on changes in watercolor.
The research method contains a structured or systematic plan used to solve problems on the object of research. The stages of the study are as Fig. 1.

Related Works
Son, et al. (2016) [3] classifying images using a convolutional neural network (CNN) on Caltech 101. The application of preprocessing and classification methods using the Convolutional Neural Network is reliable enough to determine the correctness of object image classification. This is proven by the results of an accuracy of 20% -50%. Changes in the level of confusion do not affect the accuracy of the results. This proves that the classification using the CNN method is relatively reliable for parameter changes made. By using good and optimal training data, the subset of the training data will also produce a good classification.  [4] In this study, we implement the CNN algorithm for distinguish plant types by providing a semantic label of the plant type object. The study used 5 classes of plant species, namely rice, spanish union, coconut, banana, and chili. The network learning process produces 100% accuracy of the training data. Testing on validation data resulted in 93% accuracy and 82% accuracy on test data. The results of this study indicate that the use of the CNN method has the potential for an automatic object recognition approach in distinguishing plant species as a consideration for the interpreter in determining objects in the image.
Qudsi, et al. (2019) [5] This study aims to identify handwriting in the form of numbers using the CNN method, as well as to test the CNN performance. CNN is used to recognize and classify patterns in handwriting. The dataset used in this research is the MNIST database for the training and testing process. In addition to using the MNIST database, tests were conducted on the handwriting of 20 correspondents who wrote the numbers 0 to 9.  [7] performed sign language recognition using the E-CNN method. It is used to help people with disabilities, especially people who are deaf, as a translator tool. Image processing can translate images into text. In its implementation, digital image processing uses a hand key point library, which will detect the location of the hand in each image. In this study, trials of the CNN model and the ensemble method were combined. The processes have been successfully carried out and increased the accuracy value to 99.4%. Chaiyasarn et al. (2018) [8] stated that the CNN-SVM hybrid method is very good at extracting features and classification but relies heavily on a good dataset. A good dataset may be challenging to create manually. There are many non-crack patches in the database used but fewer crack patches. Data transfer learning and augmentation can be applied to solve problems with small data sets. Detection of cracks in masonry structures is difficult because cracks cannot be easily identified in the drawings. Cracks in masonry structures resemble grout lines, which can be mistaken for cracks.
Riwayat et al. (2021) [9] stated that the CNN-SVM hybrid method could classify Van Gogh's paintings or not. The technique used is a patch image to divide each painting image into smaller parts and resize the image to a size of 224 x 224 pixels. The drawing, divided into several parts, was extracted using a CNN with two different architectures, namely the VGG-19 and ResNet-50 architectures, and the use of a linear kernel with two optimizations, namely random and grid optimization. In applying classification for patch images, the training data creates a model for classifying the patch. The results are tested and calculated for each patch with several methods, such as FAR (False Acceptance Rate), Mode, Sum, Mean, and Median. Use of VGG-19 feature extraction with a linear kernel. Grid optimization has the best results with an accuracy level of 93.00%, a precision of 87.00%, and recall of 94.00%.  [11] implemented CNN and SVM for the identification of tomato diseases via leaves. The initial process is to transform the RGB (Red, Green, Blue) image to HSV (Hue, Saturation, Value). The results of the transformation are extracted texture features using GLCM (Gray Level Co-Occurrence Matrix). The resulting image is classified by CNN (Convolutional Neural Network) and SVM (Support Vector Machine). Each classification process is compared to determine which classifier is better in detecting disease in tomato leaves automatically. The CNN method produces a better classification than SVM in detecting leaf diseases in tomato leaf images.
Rahim, et al. (2020) [12] built a model by utilizing the Convolutional Neural Network (CNN) algorithm and 1000 datasets to conduct training on deep learning systems and conduct tests to obtain accuracy values from the classification results of facial images using masks and without using masks. The results of this study indicate that the second scenario with 50 epochs and the dataset ratio of 90% training data and 10% test data has the best accuracy reaching 96%. Tests on facial images that use masks get a precision value of 98%, recall 94% and images of faces that do not use masks get a precision value of 94%, recall 98%. Scenarios one and three get the lowest accuracy value, which is 94%, so it can be concluded that the amount of training data greatly affects the accuracy value.
Muharom, et al. (2019) [13] using CNN with a Gabor filter as a feature extractor was able to get quite good results in the CBIR field with the GHIM10k dataset. This is based on the CNN accuracy value which reaches 88.12% to the test data and the mAP value of 0.895 to the test data on the number of images returned as many as 20 pieces using the Canberra distance measurement. In addition, the use of canberra distance gets better results than the euclidean distance and cosine distance. The required image return time is 0.318 seconds.
Harjoseputro, (2018) [14] classifying Javanese characters using the Convolutional Neural Network (CNN) method produces an overall accuracy rate of 85% using 1000 training images and 100 test images. Then the level of accuracy when viewed as a group of Javanese characters, it can be concluded that groups 1 and 3 with an accuracy rate of 92%, while the lowest level of accuracy when viewed per group of Javanese characters is group 2 with an accuracy rate of 72%. Meanwhile, when viewed from a Javanese script, those with the highest level of accuracy are HA, NA, CA, RA, TA, WA, LA, DHA, YA, NYA, MA, BA, and THA characters by predicting all correctly, while DA has the lowest level of accuracy because it has the highest error rate, which is 4 times during predicting.
Fitriati. (2016) [15] built a numeric handwriting classification system using the CNN LeNet 5 method which is superior in terms of accuracy, reaching 98.04% for 10,000 MNIST secondary data and 78.14% for 700 primary data. While the ELM method is superior in terms of computing time which reaches 0.00078 milliseconds. The test results are determined by the amount of training data used. The more training data, the more accurate the test will be. Determination of the number of hidden nodes has a big effect on producing the ELM accuracy value. In this study, the hidden node value that can produce the greatest accuracy value is 90 and with a sinusoid activation function. Aryasa, et al. (2016) [16] automated waste disposal using the Particle Swarm Optimization algorithm and Support Vector Machine with an analysis time of every two minutes for 1,100 test data used. Based on research from the 11 elements analyzed, the Particel Swarm Optimizaton Algorithm resulted in seven elements having the highest weight values in each measurement, namely pH, TSS, Cu, Zn, Cr(6+), Total Cr, and Fe. While the Support Vector Machine only four elements that have the highest weight value from each measurement, namely the elements Cd, Pb, Ni and Co. The research that has been done is still off-line, so there is no real-time measurement yet.

UAV (Unmanned-Aerial-Vehicle)
UAV (Unmanned-Aerial-Vehicle), also called unmanned aircraft, is a type of aircraft controlled by a remote control system using radio wave media. UAV is an unmanned system whose design is based on electro-mechanics. UAV can carry out various programmed missions with the characteristics of a flying machine and function in a mission with remote control by the user manually or automatically via radio control. Currently, the development of UAVs has been able to control itself automatically by processing data on the vehicle's sensors [17], [18]. One of the leading indicators of an autonomous engine is a navigation system embedded in an intelligent machine system through the HSV filter method [19].

Digial Image
The digital image is an image that has been digitized in both the coordinate area and brightness level. The value in the coordinates indicates the brightness or grayness level of the image at that point. In other words, a digital image is an image that has been saved or converted into a digital format. The digital image represents the image taken by the machine with a form of an approach based on sampling and quantization [20].
Sampling is the size of the boxes arranged in rows and columns or other words, sampling on the image states the size of the pixels (dots) in the picture, and quantization displays the value of the brightness level, which is expressed in the value of the gray level (grayscale) according to the number of bits. Binary used by the machine or quantization in the image states the number of colors in the image [20].

Image Resolution
Remote sensing systems have several types of resolution consisting of spatial, temporal, spectral, and radiometric resolutions. Spatial resolution is the smallest size of objects in the field that can be recorded on digital data or images. In digital data, the resolution in the field is expressed in pixels. The spatial resolution level consists of high spatial resolution ranging: 0.6-4 m, medium-range: 4-30 m, and low range: 30 -> 1000 m. This spatial resolution is found in several images, namely LANDSAT, SPOT, IKONOS, and others. While the temporal resolution is the length of time for the satellite sensor to sense the same area a second time, this resolution is only available on LANDSAT and SPOT. Temporal resolution consists of high temporal resolution ranging from 16 days. While radiometric resolution is the range of representation/quantization of data, which is usually used for raster format, with a range such as 2 bits (0-1), 3 bits (0-3), 4 bits (0-15), 5 bits ( 0-31), 6 bits (0-63), 7 bits (0-127), 8 bits (0-255), 10 bits (0-1023), 16 bits (0-65535) [21].
The larger the bit owned by a sensor, the sensor can be said to have a high radiometric resolution. Meanwhile, the problem with using this image resolution is the cost of getting the data and processing the data. Increasing the resolution results in increasing the amount of data that must be obtained. MSS, which covers 185 km x 170 km with a resolution of 79 m x 79 m, 4 bands with the 7-bit radiometric resolution for bands 4, 5, and 7, and 6 bits for band 6, takes up 24 MB of space, while TM which covers the same area, with 30 m x 30 m, 7 bands, and 8-bit resolution requires 227 MB [21].

Remote Sensing
Remote sensing is the science of obtaining information on natural phenomena on objects (the surface of the earth) obtained without direct contact with things on the earth's surface through measurement of reflection or emission by electromagnetic wave media. Objects on the earth's surface are based on the reflected value of electromagnetic wave energy emitted by objects on the earth's surface. Then the energy is recorded by the sensor [8].
There are three main groups of objects on the earth's surface that sensors can detect: water, soil, and vegetation, each of which emits electromagnetic energy with its image mapping capability depending on the characteristics of each satellite image. These channels and features are used by remote sensing to identify objects or types of land coverage on the earth's surface [21], [22].

Electromagnetic Wave Interaction
The interaction of electromagnetic waves acting in the optical spectrum (visible, nearinfrared, and medium infrared or reflected infrared) is measured/detected by sensors, some of which experience the following conditions: • In this area, reflection, absorption, and forwarding events can co-occur by following Kirkchoff and Snells.
• Energy that falls on an object will be absorbed, reflected, and transmitted.
• In the optical spectrum region, the energy measured by the sensor is the energy reflected by objects on the earth's surface, concerning the sensitivity of the sensor operating in the visible spectrum, reflected infrared (near-infrared and medium infrared).
• The amount of radiation reflected by the object received by the observer's sensor differs from each object. In other words, objects can be identified or distinguished depending on the reflective characteristics of the things.
• Spectral reflectance characteristics of various common objects on the earth's surface such as plants, soil, water [22].

Geographic Information System
Geographic Information System (GIS) technology is created by using information derived from processing several data, namely geographic data or data relating to the position of objects on the earth's surface. GIS technology integrates database-based data processing operations commonly used today, such as data retrieval based on needs, statistical analysis using specific visualizations, and various advantages that geographical analysis can offer through map images. GIS can also explain an event, make forecasts of events and other strategic plans, and help analyze general problems such as economic, population, social, government, defense, and tourism problems [23].

Image Segmentation
Image segmentation is the process of dividing an image into homogeneous regions. There are two types of segmentation process techniques: dividing the image into several parts to determine the boundaries (separating image space) and assigning a color index to each pixel indicates membership in a segmentation (clustering feature space). In determining the basis of the segmentation algorithm, it is generally based on one of the fundamental properties of the intensity value, namely, Discontinuity, Similarity, Thresholding [24].

Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN) is the development of Multilayer Perceptron (MLP) designed to process two-dimensional data. CNN is included in the type of Deep Neural Network because of the high network depth and is applied to image data. MLP is not suitable for use in the case of image classification because it does not store spatial information from image data and considers each pixel as an independent feature, resulting in poor results [3], [5].

Fig. 2. Convolution Process on CNN [3]
Science in Information Technology Letters ISSN 2722-4139 Vol. 2., No. 2, November 2021, pp. 28-37 The use of CNN in data recognition and classification began to develop since the ImageNet Large-Scale Visual Recognition Challenge competition in 2010. After that, several CNN architectures became known, such as Alexnet, which has an 8-layer architecture [25], GoogLeNet has 22 layers [26], and ResNet, which has over 100 layers [27].
There are three layers in the CNN architecture (Fig. 3) based on their function, namely Convolutional Layer (CONV), Subsampling Layer (SUBS), and Fully Connected Layer (FC). The CNN architecture usually consists of several CONV and SUBS layers, followed by an FC layer. The CONV layer is responsible for detecting certain local features at all input image locations. It acts as a connecting layer that converts the input data into a feature map that has been convoluted with filters. The SUBS layer reduces the dimensions of the feature map by selecting pixel values based on specific rules to be output. The algorithm often used in the SUBS layer is the max pooling operation. The FC layer functions to distinguish between classes and perform nonlinear transformations to obtain output values. The CONV and SUBS layers are part of feature learning, while FC is part of the classification [6], [13], [28], [29].

Results and Discussion
The training data used for introducing clean water images consists of 32 images, while the polluted water data also consists of 32 images, as shown in Fig. 4. The test data used 6 images of polluted water and 6 images of clean water.

Conclusion
Based on the training and testing results, it can be concluded that from 64 training data of the CNN method through 3 layers of convolution, it has obtained quite good results based on the value of training accuracy. From 10 trials, it was accepted that the highest test accuracy value appeared at 83.33%, while in training accuracy, the value that seemed the most was 90.62%. Based on the shortest processing time, 2.0106 s/step was obtained, with a test accuracy of 75% below the mode value. In further research, the CNN-SVM hybrid method or the Quantum CNN method can be proposed as a follow-up study to improve the accuracy of training in image recognition without patterns for identifying river water pollution.