CNN-based Audio Classification for Ambient Assisted Living and Public Transport Environments using an Extensive Combined Dataset


Dr. Danny Kowerko

Arunodhayan Sampath Kumar

Scheduled time: Saturday, 14:30 , Room W4

One of the manifold application fields of Deep Neural Networks (DNNs) is the classification of audio signals, such as environmental and indoor sounds. Publicly available datasets like the “ESC-50” for environmental sound classification are typically used to define specific classification challenges and create benchmarks out of the (mostly scientific) community. “ESC-50” contains 2,000 five seconds long audio recordings, divided into 50 classes of the categories animals, natural-soundscapes and water-sounds, human non-speech sounds, interior domestic and exterior/urban noises. Other open audio libraries are the “AudioSet” from Google and “Ultrasound-8K”. We combined subsets of these data sets with own laboratory recordings to create a collection of 20,876 audio recordings typical for ambient assisted living (AAL) and public transport. Using consumer hardware, we achieve an average validation accuracy of 92.6 % and a mean average precision (MAP) of 95.4 % for “ESC-50”, a new record in the benchmark list to this day. Further results on the classification of our combined AAL data set with 96 classes will be presented.