As the Department of Defense continues to invest in and experiment with AI, the days of using physical remote controls and relying on humans to interpret complex written commands and recommendations are ending.

AI-powered synthetic voice technologies are gaining adoption within military circles, as evidenced by use cases like the U.S. Navy’s testing of voice-directed drones, the U.S. Army’s plans to use “hyper-enabled operators” powered by voice-to-voice natural language processing, and intelligent voice commands to direct battlefield operations.

Indeed, synthetic voice has the potential to revolutionize every aspect of a soldier’s job, from making routine equipment maintenance more efficient to creating a safer and more effective warfare environment. Achieving this potential will be a highly complex undertaking that requires a different approach to data management.

Here are two factors that will drive the synthetic voice revolution and make it possible for the military to leverage this technology successfully and securely.

Large language models

Today’s Large Language Models (LLMs) require an enormous amount of processing power and energy, with some models having billions of parameters to help them make accurate decisions.

However, for many military use cases – especially those out in the field – the equipment needed to process LLMs of that size is untenable. The amount of data required to build a lighter LLM that still makes accurate decisions and articulates them in a natural way will still be significant.

The challenge is soldiers will not have the luxury of having these models processed at a powerful central location. They will need to be processed close to where soldiers are located, so intelligence can be conveyed in near real-time.

This will require smaller LLMs that can be easily processed and analyzed at the edge. Instead of being trillions of parameters, these models may only be a couple of millions. The LLMs will still pull from a large corpus of unstructured data, but they will be able to distill that data and provide actionable recommendations for only what is needed for a particular edge use case. Soldiers will be able to receive and exchange information via voice, and interactions will be processed in near real-time.

The edge devices to make this possible already exist and are getting more robust every day, and AI models will need to be optimized to run on these smaller devices.

See something, say something

Securing the data used to inform synthetic voice commands will be critically important. The director of DISA has already issued warnings about generative AI’s susceptibility to hacking. Even a simple hack could significantly alter the efficacy and accuracy of the answers and directions a voice-enabled AI assistant provides.

Therefore, security cannot be bolted onto LLMs operating at the edge. Rather, it must be incorporated throughout the entire AI lifecycle, from the time data is captured, while it’s being trained, to when it’s ultimately deployed. Encryption technologies and integration of software and hardware will be required to isolate bad data and bad actors into enclaves to ensure that operations are not disturbed.

Synthetic voice can help protect data, too, by giving a new spin on the old saying, “See something, say something.” A person could notice something suspicious, such as a potential denial of service or phishing attack, call it out, and ask the system to isolate the questionable activity.

Embedding this level of protection will be a highly complex endeavor that will call for organizations, including the DoD, to rethink the way they approach data security. It’s time to move away from the concept of network-centric security and shift to a more holistic approach that encompasses a “system of systems” mentality. Everything that AI and synthetic voice are built on--including data ingestion, models, training processes, and the data itself—must have their own security processes and policies.

Security at the point of ingestion will be particularly challenging. For example, new questions about how to decide the appropriate security classification levels for new data will inevitably arise. These decisions will need to be made before data is fed into the models to ensure that the directions the voice systems are providing are being heard by the appropriate ears only.

Security will continue to play an important role even after the voice system relays its recommendations. As synthetic voice technology interacts with a human being, the AI will collect information from the conversation to improve future recommendations. The data transmitted back will need to be tightly guarded as it moves back through the feedback loop.

The DoD is at the very beginning of what will likely prove to be a long and fruitful journey into the world of synthetic voice and AI-generated commands. But if the sudden acceleration of generative AI has taught us anything, it’s that the future is much closer than it may appear.

That means now is the ideal time for the DoD to begin preparing for the as-yet untapped potential of synthetic voice. Understanding what will be required to make it a reality is the first step toward having real-time, spoken, and beneficial human-to-machine interactions.

Gretchen Stewart is Chief Data Scientist and Burnie Legette is AI Technologist at Intel Public Sector.

Share:
In Other News
Load More