Real-Time Obstacle Detection for Unmanned Surface Vehicle Maneuver

The rapid advancement and increasing demand for Unmanned Surface Vehicle (USV) technology have drawn considerable attention in various sectors, including commercial, research, and military, particularly in marine and shallow water applications. USVs have the potential to revolutionize monitoring systems in remote areas while reducing labor costs. One critical requirement for USVs is their ability to autonomously integrate Guidance, Navigation, and Control (GNC) technology, enabling self-reliant operation without constant human oversight. However, current study for USV shown the use of traditional method using color detection which is inadequate to detect object with unstable lighting condition. This study addresses the challenge of enabling Autonomous Surface Vehicles (ASVs) to operate with minimal human intervention by enhancing their object detection and classification capabilities. In dynamic environments, such as water surfaces, accurate and rapid object recognition is essential. To achieve this, we focus on the implementation of deep learning algorithms, including the YOLO algorithm, to empower USVs with informed navigation decision-making capabilities. Our research contributes to the field of robotics by designing an affordable USV prototype capable of independent operation characterized by precise object detection and classification. By bridging the gap between advanced visualization techniques and autonomous USV technology, we envision practical applications in remote monitoring and marine operations with object detection. This paper presents the initial phase of our research, emphasizing significance of deep learning algorithms for enhancing USV navigation and decision-making in dynamic environmental conditions, resulting in mAP of 99.51%, IoU of 87.80%, error value of the YOLOv4-tiny image processing algorithm is 0.1542.


Introduction
Real-time monitoring and data collection of water surfaces play a pivotal role in enabling sophisticated data analysis within unfamiliar environments.This necessitates the observation of object detection, encompassing the identification of material objects, recognition and classification of rapid-moving objects, and determination of water levels and distances to detected objects [1].The transition from digitization to informatization and onward to intelligence characterizes the evolution of data collection and interpretation in sampling environments.
In the realm of water environment monitoring, there is a growing role for vehicles and transportation systems embedded with artificial intelligence [2][3][4] [5].Recent advancements in machine learning, particularly deep learning approaches, have emerged as potent tools in the development of intelligent transportation systems [6][7] [8].These methodologies find application across various facets of the maritime industry, encompassing boat classification [9], object detection, collision avoidance, risk perception, and anomaly detection.Maritime surveillance and autonomous boat navigation stand out as the primary application domains.The deployment of autonomous systems holds the potential to rapidly and safely gather environmental information, offering a cost-effective alternative to human-led research vessels, especially in remote and inhospitable locations such as the Arctic [10].Therefore, sophisticated information technology such as computer vision [11] system should support the intelligent transportation system.
The role of computer vision is pivotal in analyzing unfamiliar environmental features, bridging the realms of robotics and object recognition detection [12].Computer vision is presently employed to enhance image quality and meet the demands of human visual limitations.It empowers computers to detect objects within images and determine their precise coordinates, finding diverse applications both within and beyond the realm of computer science.Traditional computer vision methods frequently utilize color as a means of object detection, employing algorithms like color correction [13] [14], Hue Saturation Value (HSV) [15][16] [17][18] [19], Hue, Saturation, and Lightness (HSL) [20][21] [22], Color Tracking Methods [23], and RGB [24] [25].Within the domain of Unmanned Surface Vehicles (USVs) [4] [26][27] [28][29] [30][31] [32], object recognition detection has predominantly relied on the HSV filter as the primary algorithm.However, color-based detection is susceptible to variations in lighting conditions, necessitating a robust solution [33].This weakness may interrupt the system in recognizing the object while the environment lighting is unstable.Thus, the use of deep learning method use to enhance object detection precision and speed.
Few of the traditional deep learning method have been established such as SIFT, SVM and HOG to extract feature from images then classifying it.However, target detection algorithm based on convolutional neural network impoved to be significantly better in efficiency and accuracy, one of which YOLO are one of best develop object detection algorithm [34].YOLO offers remarkable accuracy [28][35]- [39], even in the presence of lighting noise [26] [32].The study focuses on real-time obstacle detection, a critical component of USV navigation.This paper presents a novel solution, leveraging the You Only Look Once (YOLO) algorithm for object recognition on water surfaces, specifically deployed within USVs.In achieving this goal we develops real-time obstacle detection for the USV maneuver.The design and implementation of obstacle detection determine the USV directional navigation.The navigation system is a crucial direction guide information that determines the coordinate location and confidence value of the object that moves the USV actuator to take action based on the detected object.
The design and implementation of this obstacle detection system play a pivotal role in guiding the USV's directional navigation, providing essential coordinate location and confidence values for detected objects.The main contribution benefit from this paper are as follows: (1) YOLOv4-tiny network was applied as the main algorithm of the object detection which greatly improve the accuracy and detection speed of the designated object.(2) Improve dual-hull trajectories and wave resistant using combined IMU in navigating in a dynamic environment.c In Section 2, we delve into the proposed system design, introducing the YOLO computer vision algorithm [34], the used dataset, USV motor and hull robot design, and the navigation system within the designated testing area.Section 3 presents results related to the detection of designated objects in varying light intensities and discuss the implications of these results for potential applications as an unmanned monitoring tool for water surface areas.Finally, in Section 4, we outline future research directions and areas for further development.Obstacle acts as a buoy in the sea to indicate the depth of the sea along with the legend of the sea limit.

Proposed Method
Data labeling provides a label to the dataset object that carries out the data training process of the YOLO Algorithm.This operation uses interlinked terms such as object classification, object localization, and recognition.Object classification an object assigned as a class in the existing images assigned as labels.Object Localization [41] is a process of locating the object's position in respect of x,y coordinates in the images or video assigned as a bounding box.The combination of both process object localization and classification is called Object detection.Image labeling is an image annotation process for class classification in a text file extension form.The results of the labeled information include class, center coordinates of the bounding box (x,y), and the height and width of the bounding box (h,w)-the format used in obtaining object position in an image used YOLO format.The total image retrieval dataset is 3087 images.Based on [42]- [44] the best training and test data ratio used is 80% and 20%.The labeling process provides two formats, namely Pascal VOC with .xmlfile output and YOLO with .txtfile output.In this research, the YOLO format in LabelImg [45] will be used to generate labeling information.The YOLO algorithm uses the results of labeling information to process the training stage.Before the training session, settings are configured in the Pre-Training Stage.Table 2 shows the object in the test data with 671 images.Ground truth is labeled image data to be trained with the YOLO algorithm.

YOLO Algorithm Configuration
Configuring the YOLO network requires an initial set of the configuration file and weight in the beginning of the pre-training process.The pre-trained set of weights used is called "yolov4tiny.weights".The initial weights have been trained through the ImageNet dataset and contain an entire network structure of the YOLO algorithm.All the general parameters are listed in Table 3. Width, and height of the images are set to 416×416, keeping the training time length low.Filters in the configuration are adjusted accordingly to the number of classes used in the dataset.Max Batches is set as how many iteration running through the training process as referred to the article published by Redmon and Fahardi (2018) [46] output layer of the filter follows the equation shown in Table 3. Fig. 3 shows the general object detection system.Object detection has two processes: feature extraction, classification and localization of an object based on CNN (Convolutional Neural Network).The method used is the development of MultiBox with the addition of layers performing parallel processes with regression boxes and object classification [23].The input image can have an uncertain size.The algorithm has a resize feature according to the pre-training parameters set.The resized pixel size in the image is not limited to how many measurements as long as the value in the division is 32.In image detection, the image will enter the resize first and then be divided into S x S grid cells.Each grid cell has the task of predicting the bounding box.Each bounding box contains object information of x,y coordinates, height (h), width (w), and the predicted value of the object (confidence score).The threshold is given to eliminate bounding boxes that have low accuracy values.Each grid cell will predict a probability class C that predicts one object class only despite the number of bounding boxes in the grid cell.The confidence score of the detected object uses a formula by multiplying the conditional class C with the confidence score of each bounding box generated using the equation (1) [47].

Testing Environment
In providing a USV that serves as a monitoring robot, this prototype will be implemented in a pool with a depth of 4 meters.The resulting data was recorded through live feed camera and connected through a remote monitor using a cross-platform screen-sharing system, enabling users to monitor the surrounding area remotely and efficiently.

USV Design
High-speed USV is crucial in determining research performance, especially in the preliminary design stages.Preliminary design determines specific hull types used and performance comparison between different hull types.USV design should overlook the following components [48]: viscous resistance, wave making resistance, and body form resistance which consists of pressure drag.The ability to move the USV with many advantages and performance is a must in facing the uncertainty of the environment.Therefore, a new type of marine robot USV design is proposed in this study.
Comparisons to conventional monohull design are used concerning catamaran USV to understand better the performance characteristics of the proposed design [49][50]: • Better ability in seakeeping.One of the most significant performance abilities is using catamaran USV design using rocking motion in cutting wind and waves while sailing.The complex design of the catamaran boat allows for better performance when navigating in the waves with a minor rocking motion than the monohull design.
• Better hydrostatic resistance performance.Compared to monohull, catamaram have better hydrostatic resistance due to the displacement of the catamaram, mainly on the main body deeper from the water surface.This contributes in reducing the waterline and wave resistance on the water surface.
• Stable stability and rotation performance.Due to the longer design of the two propellers inside the USV enables good steering when taking turns in avoiding obstacles compared to monohull design.Furthermore, the large surface help in minimizing the USV from overturning • Large surface area.Since the USV carry out multiple sensors and missions loads to perform certain task, large surface area to carry all the equipment need to be in demand and catamaram model design satisfied the needs of it.
Based on several consideration advantages offered, the catamaran design is used in this study.Two propellers are installed within the two hulls of the catamaran ship: the right propeller and the left propeller.Bridges are added between two hulls to load components and sensors, along with a brushless motor for each propeller.In order to monitor the surrounding environment, web-camera is placed on top of the USV.In Fig. 4 The catamaran USV has the advantage of navigating through waves in the water surface, with the efficient operation and low cost offered.The easiness of the catamaran boat can be used for monitoring, search and rescue, and navigation.

Maneuver
The actuators of the USV in this study are brushless motors and servos.The brushless motor is the rate driver of the USV while the servo is used to adjust the angle of the turning direction of the USV.Fig. 5 is an illustration of the direction of the USV.

𝑥 𝑐𝑒𝑛𝑡𝑒𝑟 = 𝑥 + (𝑤/2)/2
(2) The actuator is driven by Arduino while Jetson Nano is used as a microprocessor in the image detection process using a webcam camera.Fig. 6 shows all the components used in this research.Motor navigation on the USV uses the division of several areas on the frame to determine the direction of the servo as a turning angle.The result of reading the coordinates of the frame through the object detection system determines the turning angle and motion of the motor when it is far and near from the detected object.Fig. 7 illustrates the area division of the frame for motor navigation.There are 6 areas divided in one frame, namely: (1) Far Left, (2) Far Center, (3) Far Right, (4) Near Left (5) Near Center, (6) Near Right.The camera pixel size used is 640 x 480 pixels with the division of each area worth one-third of the pixel length and two-thirds of the width for long distance classification.

Training Results
Shown in Fig. 8 are the result of the CNN model training after 6000 iteration steps.The mAP after 1800 iterations is 92%.After 6000 iterations, the mAP value hits 99%.On the other hand, the value of Loss is decreasing as the number of iterations increases.This example's occurrence, mAP and Loss value, is typical of learning.To put it another way, the mAP trend runs counter to the loss value.The best trained model is selected as the final trained model to be used as the USV detection models after performance testing of the trained models.The learning model has achieved its best mAP at 1800 iterations, but the present iteration's loss value is still large, and the learning model's greatest results were obtained at 6000 iterations which is shown in Table 4.

Distance and Detection Accuracy
Navigation testing is done by considering two variable which are light and distance variables.The light level is measured with a luxmeter at three different times: morning (07.00-10.00), afternoon (10.00 -13.00), and evening (13.00 -16.00).Distance measurements are measured between the object and the USV by 10cm to 120cm.Testing is done with three different objects while observing the USV navigation response.The results of object detection are the average of five experimental testing with various distance variations.Table 5, Table 6, and Table 7 are the results of navigation experiments based on the light level of each object.
At distances greater than 10 cm and up to 90 cm, Boat 1 produces average accuracy that is greater than 80%.In Table 5, the detection has a poor average of accuracy at a distance of 10 cm.At a distance of 100 cm to 120 cm, detection on the camera became less accurate, with an accuracy below 70%.While detection with an average above 50% is at a distance of 40cm to 100cm with the best detection Vol. 3, No. 4, 2023, pp. 765-779 Anik Nur Handayani (Real-Time Obstacle Detection for Unmanned Surface Vehicle Maneuver) accuracy located at a distance of 90cm, Boat 2 has an accuracy below 50% at a distance of 10cm to 30 cm.Accuracy drops to less than 50% at distances greater than 100 cm.While testing from 30 cm to 110 cm has a strong detection accuracy level that exceeds 80%, obstacle objects in the test results at a distance of 10 cm to 20 cm the camera is too close to the item such that numerous bounding boxes appear at once.It makes sense that accuracy would decline as distance grew.However, Table 5 result demonstrates that the accuracy varies.The circumstance demonstrates that the relationship between distance and accuracy cannot be seen as linear.The accuracy of the detection may be impacted by additional variables like the angle of the boat relative to the sea and wind.(-) is the bounding box unable to locate in x distance.In Table 6 Boat 1 has a sufficient level of accuracy at a distance of 10cm to 20 cm.Good detection accuracy at the 80% level is produced at a distance of 30 cm to 110 cm.At a distance of 120 cm the camera detection decreased with an average value of 75%.In the second object test Boat 2 camera detection on the object experienced a sufficient level of accuracy with an average of 50% at a distance of 30 cm to 120 cm while the distance of 10 cm and 20 cm accuracy decreased below 30% due to the size of the frame on the camera for the Boat 2 object is too small so that the ship object is not fully 100% captured by the camera.Testing the third object, namely the Obstacle object, has an average accuracy level above 80% from a distance of 30 cm to 110 cm at a distance of 10-20 cm the algorithm experiences several repeated detections because the object is too close so that many bounding boxes appear.(-) is the bounding box unable to locate in x distance.Table 7 shows the result of the object detection testing with each object in the evening time conditions.Data collection was carried out at 13:00 -16:00 within 5 time trials at each distance testing.After testing the detection accuracy at 13:00 -16:00, Boat 1 result shows low average accuracy at a distance of 10 cm.Detection result from distance of 20 cm -100 cm result are average above 80%.Camera detection decreased at a distance of 110 cm -120 cm.On Boat 2 the detection accuracy of the camera decreases with an average below 30% at a distance btween 10 cm to 20 cm.At a distance of 30 cm -120 cm the detection accuracy results are average of 50%.The obstacle object test results at a distance of 10 cm -20 cm are unreadable due to multiple bounding box overlapping the intended object detection in a frame.At a testing distance of 30 cm -120 cm the camera accuracy with an average of 80%.The light intensity measurements was taken on 13:00 -16:00 were 1758 lux.(-) is the bounding box unable to locate in X distance.

USV Performance
In the realistic environment, changes of light as well as water droplets from the water would influence the target detection performance.Each detection results from live feed are reviewed manually.Detection results from three different time period are illustrated in Fig. 9.The upper, middle and lowe row of Fig. 9 listed above are the detection results achieved under different environmental condition using YOLOv4-tiny algorithm.During the experiment reflective light occured on the afternoon time Fig. 9 (b), (e), (h).The YOLOv4-tiny has potential to properly classified the designated object with high accuracy despite the environmental condition.However, the experimental results demonstrate that the proposed YOLOv4-tiny is unstable againts changes of the environmental condition despite its tolerance to the USV speed.

Distance and Manuever Accuracy
To evaluate the servo degree, detection accuracy and navigation response of The USV, distance and manuever accuracy test is conducted.The detection targets from the live-feed camera are put into different location on the frame.Table 8 shows the result of the of maneuver testing at a distance between 10 -120 cm.The result shown at 10 cm distance the USV correctly take action to stop at 10 cm with 100% accuracy with 5 time trials.Expected result of the USV between 20 -120 cm results are the USV need to approach designated object which detected by the camera.The result shown the optimal distance for the object to be detected are between 70 -90 cm which resulted a 100% accuracy.Compare to Table 9 at a distance 10 cm the USV also have the same expected response which to stop at 10 cm with 100% accuracy.Optimal detection distance along with the navigation response of the USV with Boat 2 object are achieved with a distance at 90 -100 cm, resulting in 100% accuracy.Lastly in Table 10 at a distance 10cm the USV response is to stop in front of the object which from the result of the overlapping bounding box.This effect the detection classification since it is unrecognizable by the algorithm and the USV response is to stop.If the USV detect the ostacle object above 10 cm the USV would recognize the obstacle object and take action to turn and avoid.Based on the experiment the optimal distance achieved for the USV to take turn for the obstacle object is at a distance of 40 -110 cm.A is the The USV stop, B is the The USV approach an object and C is the The USV approach and avoid an object.

Conclusion
In conclusion, this study introduced a novel YOLOv4-tiny detection method for the real-time identification of water surface targets in diverse environmental conditions.The results clearly demonstrate the superiority of our approach over conventional color detection methods, with a remarkable mean Average Precision (mAP) of 99.51% and an Intersection of Union (IoU) of 87.80%.By employing a 2MP Logitech webcam-type camera as a visual tool in conjunction with YOLOv4tiny, we achieved an impressively low error value of 0.1542.This innovative system equips Unmanned Surface Vehicles (USVs) with the ability to effectively operate amidst unpredictable environmental changes, significantly enhancing their detection adaptability.Looking forward, our research roadmap includes: (1) The integration of GPS and the development of an advanced navigation algorithm to empower the USV's path planning capabilities during detection missions.(2) We are committed to optimizing our detection model to expedite training and elevate the overall performance of the USV.The implications of this study extend far beyond the realm of autonomous navigation.Our work paves the way for safer maritime operations, precise environmental monitoring, and enhanced scientific research capabilities.The unpredictable factor such as varying light, winds, and water flows are added as future work.As we step into the future, we are excited about the transformative potential of this research in revolutionizing the field of unmanned surface vehicle technology.

Fig. 1 .
Fig. 1.Block diagram of USV navigation decision (a) A approach at B1 (b) A stay beside B2 (c) A avoided O Fig. 2 describes conversing labeling square into YOLO format in a text file.Images can have 2 or more classes in one image, adding more than one class in the algorithm containing the class of each annotated object.(Coherence with X,Y revise briefly explain X,Y Formula).

Fig. 3 .
Fig. 3. (a) Raw images used as an input are resized to 416 x 416 x 3, (b) Convolutional layer process, (c) Final detection results

Fig. 9 .
Fig. 9. Object detection result under different environment (a-c) Boat 1 Detection from morning, afternoon and evening time (d-f) Boat 2 Detection from morning, afternoon and evening time (g-i) Obstacle object detection from morning, afternoon and evening time

Table 1 .
Designated object for research purpose

Table 2 .
Total ground-truth in a dataset

Table 3 .
Configuration of YOLO parameter

Table 4 .
Loss value of each epoch

Table 5 .
Object detection morning time

Table 6 .
Object detection afternoon time

Table 7 .
Object Detection Evening Time

Table 8 .
Boat 1 detection navigation accuracy based on servos degree

Table 9 .
Boat 2 detection navigation accuracy based on servos degree

Table 10 .
Obstacle object detection navigation accuracy based on servos degree