Research Highlights

Efficient Speech-based Interfaces for Smart/IoT Devices

I am developing a speech-based human-computer interface that can be directly implemented onto devices such as smart speakers and displays, motivated by the following:

  • Efficiency and Cost: Smart devices typically record speech and estimate the direction of arrival (DOA) of the source by employing a multi-microphone array and processing the recorded signal in the cloud. The system’s performance is generally enhanced as the number of transducers in the array increases, at the expense of increased computational requirements and manufacturing cost.

  • Form Factor: As recent advances in compact electronics have made devices lighter, thinner, and more portable, it has become increasingly difficult to integrate high-quality sound systems for performing speech-based HCI without sacrificing the device’s form factor.

The interface records speech by measuring the resulting vibrations induced on a flat surface, such as a display panel or a piece of artwork. The interface then processes the recorded speech in real time by deploying compact machine learning models directly on the device’s hardware. The developed interface is the first of its kind to detect the angle of incidence of a user’s speech with a single sensing element, made possible by developing a feature space that leverages the physical and resonant properties of the panel. As the relative excitation of a panel’s resonances is based on the angle of incidence of an acoustic wave, machine learning models can be trained to reliably estimate a wave’s DOA by associating the spectral properties of the panel’s vibrations with the incident angle of the wave. A key milestone of this research was the deployment of a highly reliable (>90% reliability to within a ±5° tolerance) system for estimating DOA from a user’s speech on an embedded system.

The immediate impact of this research is an interface for smart devices that is computationally efficient, mitigates the device’s reliance on cloud-based resources, and decreases manufacturing cost by reducing the required size of the on-board sensor array. For devices that include a display panel, utilizing the surface of the display itself would allow a high-fidelity sound system to be embedded on the device without the need for a separate audio system. The interface also enables the seamless integration of IoT devices into existing environments, as any thin, flat surface (such as a user’s picture frames and artwork) can be used for speech-based HCI.


Machine Health Monitoring

I am developing a system for performing fault and anomaly detection on an embedded device that can be directly installed onto a target machine. The system supplies an interface for those who operate and maintain military vehicles to receive insights about the short and long-term health of the machine. As security is a primary consideration when installing devices onto military vehicles, the system must also be “computationally invisible” in that it cannot directly communicate with any of the vehicle’s electronics or transmit data to the cloud to access additional computing resources. Instead, the system utilizes distributed sensing elements to detect faults in the vehicle’s subsystems with machine learning models deployed on near-to-the-sensor hardware. Additional size, weight, and power (SWaP) requirements of the device include a three-year battery life and a package size not exceeding one cubic inch. As such, I am developing pre-processing methods and fault detection models with minimal complexity such that they can be deployed on ultra-low-powered hardware. A prototype system has been developed that takes up less than 50 kilobytes and consumes under 140 µW of power during inference. This model yields a 0% false-positive rate for identifying the presence of a fault in the testing set and reports an average accuracy of 94% when determining the machine’s exact operational state.


ASA.png

Acoustical Society of America (ASA) Student challenge problem 2019

This project entails the development of a predictive model whereby a propeller plane’s height, speed, and propeller frequency are extracted from an audio clip of the plane flying overhead as recorded by a hydrophone under the surface of the ocean. My submission was chosen by the reviewers as the best solution in the competition! Click here to see more