Developing deep learning architecture for image classification using Convolutional Neural Network (CNN) algorithm in forest and field images

Article history Received July 12, 2020 Revised September 10, 2020 Accepted November 25, 2020 Indonesia is an agricultural country with a variety of natural resources such as agriculture and plantations. Agriculture and plantations in Indonesia are diverse, such as rice fields that can produce rice, soybeans, corn, tubers, and others. Meanwhile, plantations in Indonesia are like forests with timber products, bamboo, eucalyptus oil, rattan, and others. However, rice fields, which are examples of agriculture, and forests that are examples of plantations, have the same characteristics. It is not easy to distinguish when viewed using aerial photographs or photographs taken from a certain height. For recognizing with certainty the shape of rice fields and forests when viewed using aerial photographs, it is necessary to establish a model that can accurately recognize the shape of rice fields and forest forms. A model is to utilize computational science to take information from digital images to recognize objects automatically. One method of deep learning that is currently developing is a Convolutional Neural Network (CNN). The CNN method enters (input data) in the form of an image or image. This method has a particular layer called the convulsive layer wherein an input image layer (input image) will produce a pattern of several parts of the image, which will be easier to classify later. The convolution layer has the function of learning images to be more efficient to be implemented. Therefore, researchers want to utilize this CNN method to classify forests and rice fields to distinguish the characteristics of forests and rice fields. Based on the classification results obtained by testing the accuracy of 90%. It can be concluded that the CNN method can classify images of forests and rice fields correctly.


Introduction
Indonesia is an agricultural country with a variety of natural resources such as agriculture and plantations. According to the world food and agriculture organization, Indonesia has the Indonesia is an agricultural country with a variety of natural resources such as agriculture and plantations. Agriculture and plantations in Indonesia are diverse, such as rice fields that can produce rice, soybeans, corn, tubers, and others. Meanwhile, plantations in Indonesia are like forests with timber products, bamboo, eucalyptus oil, rattan, and others. However, rice fields, which are examples of agriculture, and forests that are examples of plantations, have the same characteristics. It is not easy to distinguish when viewed using aerial photographs or photographs taken from a certain height. For recognizing with certainty the shape of rice fields and forests when viewed using aerial photographs, it is necessary to establish a model that can accurately recognize the shape of rice fields and forest forms. A model is to utilize computational science to take information from digital images to recognize objects automatically. One method of deep learning that is currently developing is a Convolutional Neural Network (CNN). The CNN method enters (input data) in the form of an image or image. This method has a particular layer called the convulsive layer wherein an input image layer (input image) will produce a pattern of several parts of the image, which will be easier to classify later. The convolution layer has the function of learning images to be more efficient to be implemented. Therefore, researchers want to utilize this CNN method to classify forests and rice fields to distinguish the characteristics of forests and rice fields. Based on the classification results obtained by testing the accuracy of 90%. It can be concluded that the CNN method can classify images of forests and rice fields correctly.

84
Science in Information Technology Letters ISSN 2722-4139 Vol. 1., No. 2, November 2020, pp. 83-91 most productive paddy fields in Asia, with 4.9 million hectares of paddy fields. Many Indonesians work as farmers and planters because regions in Indonesia mostly have agricultural land and plantations [1]. One example of plantation types is forest. Based on data from the Ministry of Forestry and the Environment as of March 2019, the achievement of social forestry has now reached 2.56 million hectares (ha), consisting of village forests (1.28 million ha), community forests (245,593 ha), community plantations (331,993 ha), forestry partnerships (549,785 ha), and customary forests (28,286 ha) [2].
Agriculture and plantations in Indonesia [3] are diverse, such as rice fields that can produce rice, soybeans, corn, tubers, and others. Meanwhile, Indonesia's plantations are like forests with timber products, bamboo, eucalyptus oil, rattan, and others. However, rice fields, which are examples of agriculture, and forests that are examples of plantations, have the same characteristics. It is not easy to distinguish when viewed using aerial photographs or photographs taken from a certain height. For recognizing with certainty the shape of rice fields and forests when viewed using aerial photographs, it is necessary to establish a model that can accurately recognize the shape of rice fields and forest forms.
At this time, advances in information technology can not be avoided with the development of hardware in advancing computer performance and developing software that can resemble human intelligence (artificial intelligence). Computers nowadays can get things done more efficiently, more quickly, and in less time. One of the technologies that can facilitate humans is deep learning. Deep learning is a branch of machine learning based on artificial neural networks [4] that train or teach an action that is considered reasonable for humans. Deep learning can automatically classify images [5], text, and videos by making videos into images [6].
One method of deep learning that is currently developing is a Convolutional Neural Network (CNN). The CNN method enters (input data) in the form of an image or image. This method has a particular layer called the convulsive layer wherein an input image layer (input image) will produce a pattern of several parts of the image, which will be easier to classify later. The convolution layer has the function of learning images to be more efficient to be implemented. Therefore researchers used this CNN method to be able to classify forests and rice fields. In order to distinguish the characteristics of forests and rice fields.

Image
The image can be interpreted as a two-dimensional function, f (x, y), where x and y are spatial coordinates, and f (x, y) is a value at coordinates (x, y), which is often called intensity [7]. A digital image is an image of f (x, y) that has been digitalized or used as a digital system in terms of area coordinates and the intensity value. A digital image consists of several elements called picture elements or pixels.
Image is a representation (picture), similarity, or imitation of an object divided into two kinds, namely analog images, and digital images. Analog image is a continuous image such as images on television monitors, X-rays, and others. At the same time, digital images can be processed by computers [8].

Digital Imagery
Digital images represent images taken by machines in the form of an approach based on sampling and quantization [9]. Sampling states the size of the boxes arranged in rows and columns. In other words, sampling on the image states the pixel's size (point) in the image. Quantization states the value of the brightness level expressed in gray level values (grayscale) according to the number of binary bits used by the machine; in other words, quantization in the image stating the number of colors in the image.
Digital images are mapped into pixel elements in the form of two-dimensional matrices and grid shapes in a computer. Each pixel has a number that indicates the color channel [10], [11]. ISSN 2722-4139 Science in Information Technology Letters 85 Vol. 1., No. 2, November 2020, pp. 83-91 Numbers on each pixel are stored sequentially by a computer and are often subtracted for certain compression and processing purposes. A digital image can be represented by a matrix consisting of M columns N rows [12], [13], where the intersection between columns and rows is called pixels. (pixel = picture element), which is the smallest element of an image. Pixels have two parameters, namely coordinates and intensity or color. The value contained in the coordinates (x, y) is f (x, y), which is the intensity or color of the pixel at that point. Therefore, the image can be written into a matrix (1).
Based on the formula above, an image f (x, y) can be written into a mathematical function as below (2) Where M is the number of lines in the image, N is the number of columns in the image. The G is a grayscale scale value (gray). The magnitude of the values M, N, and G is usually the enhancements using the formulas: Where the values of m, n, and k are positive, the interval (0, G) is called (grayscale). The value of G depends on the digitization process. Usually, gray (0 (zero) expresses black intensity, and 1 (one) indicates white intensity. For an 8-bit image [14], G's value is equal to 28 = 256 colors (gray degree).

Artificial Intelligence (AI)
Artificial Intelligence or Artificial Intelligence (AI) is a technique or method applied to imitate the intelligence possessed by living things to solve a problem. Artificial intelligence (artificial intelligence) is a study of how to make computers do things that are currently done better by humans. Artificial Intelligence (AI) functions to identify or model human thought processes and design machines so that they can mimic human behavior. Machines can act like humans with good knowledge and logical thinking skills. The function of Artificial Intelligence (AI) is to make robots that can resemble intelligence or even more than the intelligence possessed by humans.

Machine Learning
Arthur Samuel first defined the term machine learning in 1959. According to Arthur Samuel, machine learning is a computer science field that provides learning to computers to know things without an exact programmer [15]. In simple machine learning is to build an algorithm that allows computer programs to learn and do their work without any instructions from users. This algorithm works by making a model of input produce a decision based on existing data. Machine learning is directly related to computational statistics centered on making decisions based on computer usage. Some machine learning applications are image processing, search and recommendation engines, finance, speech understanding, and text analysis [16].

Deep Learning
Deep learning is one technique in machine learning that utilizes many layers [17] of nonlinear information processing to perform feature extraction, pattern recognition, and classification [18].

86
Science in Information Technology Letters ISSN 2722-4139 Vol. 1., No. 2, November 2020, pp. 83-91 According to [19], deep learning is an approach to solving computer learning systems that use hierarchy. The concept of hierarchy makes computers able to learn complicated concepts by combining more straightforward concepts. If a graph is described how the concept is built on top of other concepts, this graph will be deep with many layers, and this is the reason referred to as deep learning (deep learning).

Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN) is one of the algorithms from Deep Learning, the development of Multi-Layer Perceptron (MLP), designed to process data in grid form, one of which is a two-dimensional image, for example, images or sound [20]. CNN is included in the type of Deep Neural Network because of the high network depth and is widely applied to image data or images. Image classification can be used with MLP, but the MLP method is not appropriate because it does not store spatial information from image data and argues that each pixel is an independent feature (characteristic), resulting in unfavorable results.
Methodically, CNN is an architecture that can be trained and has several stages. Input (input) and output (output) of each stage are composed of several arrays called feature maps. Each stage consists of three layers, namely convolution, layer activation function, and layer pooling.

Confusion Matrix
The confusion matrix is a method used to evaluate decision tree classifications [21]. The confusion matrix is a table consisting of many rows of test data that are predicted to be true and incorrect by the classification model. This table is needed to measure the performance of a classification model [22].
A confusion matrix is a useful tool for analyzing how well a classifier/grouping can recognize tuples from different classes [23]. In making confusion matrix tables, four things must be known as in Table 1.

Precision, Recall, and Accuracy
Precision is the accuracy between the information requested by the user and the system's answers. To calculate the value of precision using (5).
The recall is the success rate of the system in finding back information. To calculate the recall value using (6).
Science in Information Technology Letters 87 Vol. 1., No. 2, November 2020, pp. 83-91 Accuracy is the level of closeness between the predicted value and the actual value. Accuracy can be interpreted in the form of truth or accuracy of errors. To calculate the Accuracy value, use (7). Fig. 1 shows 100 image data/images that are two categories of forests and fields in the DataHutanSawah folder. Because naming image data/images using their respective category names, the image data have an automatic sequence by the name category's alphabet, i.e., the 1st category's location is the forest, and the 2nd category is the rice field. The available dataset will then be determined by the amount of data divided into training and testing data. Researchers determine each category to use as much as 50 image data with 3 comparisons of training data scenarios namely 90%, 80%, and 70% (Table 2), for scenarios 70% of training data used is (50 x 70): 100 = 35 so for testing data 50 -35 = 15, for the scenario of 80% of the training data used is (50 x 80): 100 = 40 so the testing data is 50 -40 = 10, and for the scenario of 90% the training data used is (50 x 90): 100 = 45, then the testing data is 50 -45 = 5 images. At this stage, the researcher will conduct several experiments to compare models using the Convolutional Neural Network (CNN) architecture to obtain the best classification results. The stages of obtaining the best CNN architecture can be seen from the parameters used. In this study, researchers compared the epoch value and compared the number of train data and test data. This architecture design is compared with the initial architecture with 80% data train parameter comparison, 20% test data, and 50 epoch.  Table 3.

88
Science in Information Technology Letters ISSN 2722-4139 Vol. 1., No. 2, November 2020, pp. 83-91  The following results obtained from the comparison of epoch 50, 65, and 80 show that the greater the epoch used, the better the accuracy results obtained. Many epochs are used to make the algorithm more trained and able to recognize patterns better. Many epochs make models even better.
After comparing the epoch values, the researcher compares the number of train data and test data 70%: 30%, 80%: 20%, and 90%: 10% with the previous optimal architecture, the results obtained after comparing the number of train data and test data presented in Table 4. Based on Table 4, it can be seen that the highest accuracy value is obtained by using a ratio of 90%: 10% with data train 90 and data test 10. Accuracy results obtained are equal to 0.988 or 98.8% for training accuracy and by 0.9 or 90% for accuracy testing. A comparison of the three results can be seen in the plot's process, as shown in Fig. 2. After CNN architecture has been designed, as shown in Fig. 3. The best model is obtained, having a total number of parameters of 475,938 with the details in Table 5.

Classification Results
After getting the best CNN architecture, then proceed with the results of the classification. The classification process uses the test data as ten images, with each category as many as five images. The best classification results table from the testing data is presented in Table 6. Based on Table 6, the classification results are obtained using testing data. The number of forest images predicted as forest images with five images means the forest image classification is correct. Furthermore, the number of paddy imagery predicted as paddy imagery is four images, and there is an unpredictable missing data as paddy imagery of 1 image. Calculation of classification results and accuracy values can be calculated using the (7). The testing accuracy has 90% of the result of testing.
Based on the (8) calculation results obtained accuracy values based on the classification results with the best CNN architecture using 90% testing data, the results of the classification of predicted forest images according to the actual forest images are only five images and 4 fields images for paddy images.

Conclusion
Based on the results and discussion related to the researcher's analysis, it can be concluded that CNN in the forest and field image classification process can be done by finding the best model architecture. This architecture compares two parameters: the number of epochs and the scenario for comparison of train and test datasets. The classification results have an accuracy level of 90%, so the CNN method with these parameters can be used to classify forest and field images.