In cold, dark waters, even a simple misunderstanding can leads to danger and death. For recreational divers, whose training may be limited and whose equipment simpler than that of professionals, effective underwater communication isn’t just helpful—it’s vital.
Recreational divers mainly rely on hand signals—a simple and universal system that depends heavily on visibility and shared understanding. Alternative methods like writing slates, dive computers, and audio units are often too expensive, bulky, or overcomplicated for recreational divers. Therefore, communication failures continue to be a significant factor in diving incidents.
With recent advancements in sensor technology and artificial intelligence, gesture recognition offers a new path. By recognizing and interpreting divers’ natural hand movements, this technology could enable more accessible, intuitive, and safer underwater communication. This paper will first review existing underwater communication methods and there differences with gesture recognition, then explore how gesture recognition works, assess its limitations, and finally propose future directions for its integration in recreational diving.
Divers rely on a range of communication methods underwater, from simple hand signals to advanced audio systems. Each method offers unique strengths but also significant limitations—especially in recreational diving contexts where portability, affordability, and ease of use are critical.
Hand signals are the most widespread and foundational form of underwater communication. Their universality allows divers to communicate across language barriers and serves as a reliable backup when technology fails. However, hand signals are heavily dependent on clear line-of-sight and optimal environmental conditions. In murky water or low-light settings, signals can be misread or go unseen. This becomes particularly challenging during night dives, a common and important part of recreational diving, where visibility is naturally limited and hand gestures can easily be overlooked. Additionally, hand signals are limited in complexity—they can’t convey detailed instructions or complex requests, especially in emergency situations.
To address the need for more detailed communication, divers sometimes use writing slates or dive computers with messaging capabilities. These tools enable information-rich exchanges but come with their own downsides: they are slow to use, rely on adequate lighting, and require compatible equipment among all divers—making them impractical in time-sensitive or dynamic conditions.
Audio communication systems, such as the Ocean Reef GSM MERCURY, represent a high-tech solution, allowing divers to speak through full-face masks via hydrophones. These systems support real-time, long-range communication and are commonly used in professional and technical diving. However, they come with significant drawbacks: they are often expensive, bulky, and prone to interference from surrounding noise, especially in dynamic underwater conditions. As noted in Underwater Communication: Skills for a New Way of Diving, “air bubbles create a barrier against ultrasound waves, and as they break into micro-bubbles, they tend to adhere to the antenna of the communicator,” which can “reduce the range of communicators by up to 80%.” In a more understanding way, talking with these full face masks is like “speak to Donald Duck on zoom with a lagging connection”. In addition to signal interference, underwater speech also requires specific physical conditions. The same source explains that “two things are required in order to speak underwater: a sufficiently large volume of air in front of the mouth, and the facial mobility needed to speak as we do on the surface.” These physical demands add further strain to already heavy equipment, making audio systems less practical for recreational divers who prioritize simplicity and comfort.
Given these constraints, AI-enhanced gesture recognition offers a compelling alternative for recreational divers. It builds on what divers already know: hand gestures. Integrated into smart goggles or wearable devices, it offers hands-free operation, reduces cognitive load, and supports a smoother, more immersive diving experience. Thanks to advances in large language models and deep learning, these systems are now capable of highly accurate recognition. Additionally, as the technology matures, the cost of sensors and on-device computing continues to drop, making this solution increasingly affordable and realistic for recreational divers working with limited budgets. To fully understand how gesture recognition could improve underwater communication, it’s important to look at how these systems actually work—and what kinds of technologies make them possible.
Gesture recognition converts physical movements into digital signals. It lets divers communicate with hand signals that get translated in real-time, making it easier for divers to talk to each other, receive warnings, and control their diving equipment. To better understand the potential of gesture recognition in improving underwater communication, it’s helpful to examine how these systems function.
According to Liu and Wang, gesture recognition follows a multi-stage pipeline: sensor data collection, gesture identification, gesture tracking, gesture classification, and gesture mapping. The process begins when sensors—whether cameras, accelerometers, or other devices—capture raw motion data from the user. This data is then parsed to detect the presence of a gesture, which may be static (like a hand sign) or dynamic (like a sweeping motion). For dynamic gestures, tracking algorithms such as Kalman filters or particle filters are employed to follow the motion across frames. Once the gesture is recognized, classification algorithms like Support Vector Machines (SVMs), Hidden Markov Models (HMMs), or deep learning networks interpret the gesture and match it to a known action. Finally, the system translates the gesture into a specific command or response—be it a signal, interface interaction, or safety notification.
Within this framework, gesture recognition technologies for underwater use typically fall into two categories based on types of sensors: image-based and non-image-based, each with its own strengths and challenges. The next section explores real-world examples of both image-based and non-image-based systems, highlighting how they function under water and what trade-offs they present for recreational divers.
Image-based gesture recognition relies on cameras and computer vision algorithms to interpret hand and body movements. These systems may use single or stereo cameras to capture 2D or 3D visual input, or more advanced depth sensors like Microsoft Kinect that create 3D point clouds of gesture data. In some systems, divers may wear markers to increase visibility and tracking accuracy. While image-based systems work well in environments with good visibility, they struggle underwater where light refraction, motion blur, and floating particles such as bubbles or sediment can distort visual data. Despite these challenges, advances in AI-powered vision algorithms, such as convolutional neural networks (CNNs), helps improve performance by enhancing recognition accuracy in visually noisy environments.